1. Field of the Invention
The present invention relates to an image and voice communication system and a videophone transfer method and, in particular, to an image and voice communication system and a videophone transfer method for communicating an image such as a speaker's face or an alternate image to a communication mate and for talking with the confirming mate's face and the like when the speaker's voice is transmitted to the communication mate.
2. Description of the Related Art
Heretofore, there have been various known types of so-called videophone systems for communicating an image such as a speaker's face simultaneously with talking with the confirming mate's face and the like when speaker's voice is transmitted to the communication mate. A majority of these videophone systems use existing telephone lines, and hence, each of them simultaneously transmits pseudobidirectionally a speaker's voice data signals with image data signals such as face images.
However, it is difficult to transmit the face image data as a moving picture due to the large amount of information transmitted over an existing telephone line.
Thus, a videophone system which transmits a still picture piecemeal is adopted so as to be accommodated by a transmit path having a small transmit capacity such as a telephone line and to reduce an amount of transmitted information per unit time.
However, it is difficult to accurately transmit a moving picture in real time with this type of videophone system. Due to such difficulty, it is impossible to transmit an image of a natural countenance to a communication mate, and, consequently, a transmitted image of countenance is awkward.
In an attempt to solve this problem, a teleconference system using computer graphic (CG) technology was recently proposed, which is discussed in Japanese Unexamined Patent Publication No. 7-38873. The teleconference system proposed therein will be summarized below.
First, shape information such as concave-convex and color information of attendees' faces in a conference is acquired using a laser scanner or the like. Alternatively, the information of face images may be acquired with a digital camera or the like. A wire frame model of each attendee is then created by transforming the above-mentioned shape information into 3D polygon data.
In addition, when the conference is held, one or more markers are attached on the face of each attendee, and sensors detecting motion of the head, arm, and body are attached on respective portions of each attendee. The system detects motion of his/her face by detecting the marker attached on his/her face with a camera mounted in her/his vicinity such as a headgear mounted on the individual to follow the motion of her/his head, arm, and body with the sensors attached on his/her body portions.
Next, on the basis of motion data of the respective body portions, this system changes the wire-frame model created beforehand as described above, in real time. Further, this system completes the graphic image of an attendee corresponding to this wire-frame model by filling in the color taken in beforehand.
Thus, the system displays in real time the completed graphic image of the attendees on a screen concurrently with the attendee's motion. Consequently, it is possible for each attendee to perform discussions with recognizable countenances of the other attendees by viewing this screen display.
In using this method, a data volume varying in real time is small since image data requiring large data volume has already been taken in by the system. Hence, it becomes possible to transmit the speaker's moving picture in real time with a video system using a transmit path having a small transmit capacity such as an existing telephone line or the like.
The teleconference system proposed in the above-mentioned Japanese Unexamined Patent Publication No. 7-38873 has the drawbacks of requiring a significant amount of time and effort for attaching markers on attendees' faces and attaching sensors on their heads, arms, and bodies before the start of the conference. In addition, these drawbacks render this system inadequate for use outside a business setting, such as in ordinary homes.
The videophone system in the teleconference system requires that measurements and various data of users' face images, i.e., speakers be inputted beforehand with a scanner and the like. Hence, it is very difficult to perform this type of large-scale measurement in ordinary homes because of cost and the like. In addition, although this system requires the attachment of markers on a speaker's face before talking with a telephone, it is not practical to attach the markers on the speaker's face for every telephone call in an ordinary home when he/she is the receiver of a telephone call.
Another problem common to known videophone systems such as the one discussed above is that such systems impose a significant restriction in the mobility of the user during operation of the system by requiring that the user be positioned before a screen when talking.