1. Field of the Invention
The present invention relates to an image displaying apparatus which displays a main image accompanying a auxiliary image.
2. Description of the Related Art
A technology is known which combines an animated image, which is a cartoon of lips, sign language or the like to communicate what someone said, as auxiliary information with a main image. For example, a known technique is for linking images of lips or sign language prepared in advance with each other in accordance with inputted text data.
Meanwhile, with respect to a video conference system equipped with a translation function, another technique is known which partially replaces an actual image with an animated image in tune with the voice of a translation. With this, by means of partial replacement of images, it is possible to provide natural images of people which match with the voice of a translation. The following is the details of replacement of an image. (1) Prepare a three-dimensional model of a human head portion and a texture of a mouth area of an attendee to a conference in advance. (2) Analyze actually shot images of the conference and detect the direction of the head portion of a speaker. (3) Correct the coordinate system of the three-dimensional model in accordance with movements of the head portion of the speaker. (4) Paste a texture of the speaker to the three-dimensional model and create animated images. (5) Replace the mouth area of the speaker in the actual images with the animated images.
Animated images of moving lips serving as conventional auxiliary information is not based on appearance information regarding people who appear in main images, and therefore, have a problem that the animated images do not look real very much. Available as a technique for providing more realistic auxiliary information is to synthesize an actually shot close-up of a human mouth area. However, this technique requires to shoot a close-up with a different camera from a main image camera or to execute image processing of extracting and enlarging a part of a main image. The former leads to a problem that shooting becomes a large-scale one, and there is a restriction that it is necessary to create auxiliary information at the same time with main images. The latter has a problem that when a person does not stay at a constant position in main images or more than one people appear in main images, it is difficult to extract by means of automatic image processing a particular part showing a speaker without fail.
On the other hand, where images with a plurality of people appearing are to be displayed as in the case of a video conference or in a scene of a conversation, the faces of all people do not necessarily appear with a sufficient size in images all times. For instance, when broadcasting a meeting attended by a large number of people, some attendees may turn their back to a camera sometimes. Further, even when there is only one person to be shot as in a solo scene in a drama, if the shot is to show this person from back or in a far distance, this person's face may not appear in the image or may look small even if appearing. Hence, a conventional apparatus which partially replaces a human body in images with animations bears a problem that people watching the images can not visually recognize who in the images is saying what.