Messaging systems (mobile box, voicemail systems) are sufficiently known in the market. Depending on the system structure, speech messages, picture messages and/or video messages are also often stored as attachments to text messages (email attachments), together with sender information (e.g. sender identification, e.g. CLI (Calling Line Identification), HLR (Home Location Register), sender address, and similar), as complete video messages in video mailboxes based on video telephony, in a manner analogous to voice mailboxes or digital answering machines, which have been in existence for a long time.
It can also be assumed as generally known prior art that, in computer-animated application, use is made of avatars equipped with functionalities that enable texts to be output through application of speech synthesis, with corresponding lip movements being derived and represented. Through means for face detection and determination of facial portions filled by particular facial elements (e.g. lips), lip movements of a speaker that have been recorded by means of a camera can be inserted in the lip region of a still photo, which can assume the role of an avatar.
It is described in EP1648151A1 that messages are stored for subsequent processing for a system for machine translation of further messages. In this case, however, the emphasis is on the evaluation of textual and phonetic information contained in the stored messages. This also applies to picture messages, which might possibly contain texts in the picture or in meta information. In this case, modification of the actual picture contents is not described.
Described in WO 2006/047347 is a method wherein an AVATAR can be added to a message individually by a user.
The representation of pictures in picture messages or video messages is effected two-dimensionally, since the recording methods used in telecommunications terminal devices are based on these terminal devices being equipped with a camera. Additional equipment, with more than two cameras in combination with transmission methods that require a higher bandwidth, is obvious, but is not advisable in more compact devices, owing to a lesser base width and the resultant low 3D quality. A 3D representation of the speaker of a picture and/or video message is therefore possible only with greatly impaired quality.
Also 2D avatars, which, as described above, can be generated from photos, can therefore be extended to 3D animations, by means of two simultaneously recorded pictures in each case, only to a limited extent and with impaired quality. At the same time, methods for three-dimensional representation of pictures and/or videos are making ever greater progress, such that, here likewise, a requirement arises for 3D representation of video/picture messages.
Likewise known is the 2D representation of virtually generated, animated avatars, which exist as a complete 3D model within a system and which, for example, reproduce texts on a system consisting of a display and loudspeaker, without a right and a left picture being generated simultaneously in this case (for example, as described in DE102004014189A1).
A true 3D representation of objects or pictures is based on the (quasi-) simultaneous output of two 2D pictures, one picture being intended for the left eye and the second picture being intended for the right eye.
The recording by means of laterally offset cameras for the purpose of generating 3D pictures is known generally, and is not to be described here. Methods also exist for generating artificial 3D pictures from a single 2D picture, from existing models, the basis of which is constituted by a complete 3D model of manually annotated/analysed 2D objects, which is represented by the spatial coordinates of a greater number of points on the surface of the objects. Examples of this are EP1412917B1 and EP0991023B1.
For the purpose of viewing 3D representations, there are various methods using aids, e.g. by means of lens rasters, anaglyph methods, polarisation spectacles, etc.
Also, two 2D pictures provided for the right and the left eye can be viewed without aids, through cross-viewing or parallel viewing.
In U.S. Pat. No. 4,925,294A1 it was attempted to describe a general method for generating a 3D picture from a single 2D picture by separating various foreground and background picture elements and, from knowledge of their 3D characteristics, generating a right picture and a left picture, respectively, from the 2D picture. In the following description, the generating process always requires two 2D pictures, from which one 2D right picture and one 2D left picture, respectively, is then generated, i.e. the described knowledge of the properties of individual objects or of object models is not required in the method described below.