(1) Field of the Invention
The present invention relates to virtual television phone communication using a communication terminal apparatus with a display device intended for a user to enjoy voice conversation in a visual environment through a virtual three-dimensional CG (computer graphics) character.
(2) Description of the Related Art
Conventionally, what is called a television phone apparatus is an apparatus for having a conversation with a partner over a telephone device with a camera and a display device while seeing the face image of the partner shot by the camera. In order to reduce the transmission amount of data, the face image data is generally compressed, multiplexed with the voice data and sent to a receiver. At the receiver's end, the multiplexed data is divided into the voice data and the compressed image data, the image data is decompressed, and then the voice is outputted and the image is displayed in synchronization with each other. Recently a cell phone which is called Videophone for a next-generation mobile communication (IMT-2000) has been developed based on the MPEG-4 (Moving Picture Experts Group Phase 4) image compression standard (See “NIKKEI ELECTRONICS” 1999. 11. 1 (No. 756), pp 99-117).
In order to send the multiplexed image as mentioned above, a communication standard for a wide band beyond the framework of the conventional voice communication and an infrastructure for realizing such a wide band communication are required. Therefore, there is an invention which is designed to artificially realize a function similar to a television phone via voice data communication only (See Japanese Laid-Open Patent Application No. S62-274962), not by an image compression method as above. According to this invention, the telephone holds in advance a static image of a partner's face which is processed into a face without a mouth as well as static images of mouths which are processed into shapes of pronouncing vowel sounds such as “a”, “i” and “u” in Japanese, for instance. The vowels included in the voice data sent from the partner are analyzed using a voice recognition technology, the mouth shape data based on the analysis result is merged into the face image and displayed whenever necessary so as to display the appearance of the partner who is talking. The advantage of this invention is that it can realize artificial television phone communication in the framework of the ordinary voice communication. However, there is a doubt as to whether the user feels nothing unnatural about an image which shows no movement but a mouth or the user can feel like talking with the partner himself.
Beyond the framework of the voice communication, there is another invention which adopts an image recognition technology in order to reduce the data amount rather than sending the image itself (See Japanese Laid-Open Patent Application No. H05-153581). According to this invention, facial expressions and mouth shapes are recognized using the image recognition technology, transformed into parameters and sent together with the voice data. The receiver, which holds the partner's three-dimensional model in advance, transforms the three-dimensional model based on the received parameters and displays it during the output of the voice.
The above-mentioned three inventions are all intended for having a conversation with a partner while seeing his face, not for enjoying the conversation itself more.
These inventions relate to a so-called telephone technology. The popularization of the Internet enables us to have a conversation via a personal computer, though it is mainly a text-based conversation. Under the circumstances, there is an invention in which a user has a CG character who represents himself participate in a common virtual space to enjoy a conversation with a character who represents another participant in that space (See U.S. Pat. No. 5,880,731). The object of this invention is to have a conversation with a partner anonymously and the user participates in the conversation independent of his real self, so he is able to enjoy imaginary conversations which include fictional characters. The CG character which represents the user is called an avatar because it acts for the user participant who selects the character. The participant himself selects this avatar, and his conversation partner cannot change the character of the avatar. Also, since this avatar is just something for the other participants to identify the partner, it does not need to be changed. In view of realization of this invention, a server computer is required for managing the common virtual space for the participants and controlling their states, in addition to the terminal computers of the participants (client computers).
A technology for having a conversation with a virtual CG character is made open by Extempo Systems Inc. on their Web page of the Internet, for instance. This relates to a text-based conversation with expert characters on the Internet, not a voice conversation.
In the technical aspect, this invention is designed to establish a conversation between a CG character and a person by creating a conversation dictionary classified into keywords in advance, analyzing the matching between the partner's conversation contents and the classified keywords and displaying the most matching conversation sentence. The conversation is established as such even with an ambiguous sentence because of the high human ability of understanding the conversation, but the repeated display of the same sentence is gradually increased during the conversation because the number of the registered conversation sentences is limited. This invention provides new entertainment of having a conversation with a virtual CG character, but such a conversation is quite different from the conversation with a real human in view of flexibility, diversity, appropriateness and individuality. The goal of this technology is how to get close to real human conversation ability.
The characteristics of the above conventional related arts are as follows. The first three are invented upon a request of having a conversation while seeing the partner's face, and the object thereof is to have a conversation while confirming the partner's expression and appearance. Therefore, they are not designed to enjoy the conversation more by putting some processing on the displayed image and the voice through some kind of the receiver's own action, and the technology for that purpose is not disclosed.
The fourth prior art is designed to have a CG character selected by a user participate in a virtual community space anonymously and enjoy a direct and frank conversation or an imaginary and fictitious conversation by this reason of anonymity. Therefore, the CG character of the conversation partner is something just for identifying the partner, not for enjoying the more entertaining conversation by making the CG character and its voice do some kind of action. The fifth prior art has an aspect of enjoying the conversation with a virtual CG character having an artificially intelligent conversation function, but such a conversation is quite different from the conversation with a real human in flexibility, appropriateness and individuality.