A video telephone combines an image capture device, an image display device, and a codec for coding and decoding the image.
Several different types of devices are available for the display of a video image. In U.S. Pat. No. 5,347,400 Hunter discloses a helmet-mounted display system for use in virtual reality applications. In U.S. Pat. No. 5,396,269 Gotoh discloses a display similar to that of a desktop PC. Gotoh combines the display with an image capture device which sits in stationary position on a desktop surface.
The image capture device is usually combined with a signal generator within a video camera. The video camera should be capable of capturing a facial image during the movement and gesturing of normal conversation. In particular, facial expressions should be captured during movement of the body. In U.S. Pat. No. 5,414,444, Britz discloses a communicator which incorporates a system of motors to orient the video imaging element. In U.S. Pat. No. 5,414,474 Kamada discloses an apparatus which tracks a moving body.
An additional feature of a video telephone should be the ability to make effective use of its limited communications bandwidth. In U.S. Pat. No. 5,371,534 Dagdeviren discloses a method of communicating audio and video signals using high speed digital ISDN telephone lines. ISDN is a mode of communication for the current invention, and U.S. Pat. No. 5,371,534 is hereby incorporated by reference. Even at the 128 kbps typical of ISDN circuits and using MPEG image compression, the bandwidth typically limits resolution below standard display resolutions and frame rates are typically reduced to 15 frames per second or less.
Furthermore, the ISDN circuits are not yet universally available, so the goal of ubiquitous video telephony cannot yet be realized through ISDN. According to Metcafe's Law, the value of a network increases with the square of the number of user's. By this measure the value of the network of current video telephones is far below its potential value.
The design of mobile devices and of devices using the more generally available plain-old-telephone-service (POTS) is even further constrained by limitations on communications bandwidth. Mobile devices have additional design constraints which limit their size, weight, and complexity.
Most current systems do not track movement of the user's face. Instead, the video camera has an oversized field of view to ensure that a shifting face remains within the image area. The is wasteful of the resolution of the video camera and of the communications bandwidth.
Even a complex system which can continually zoom, pan, and focus and can successfully track the user's movements has limitations. For example, by turning, a user can easily direct his face away from video camera so that his facial expressions cannot be captured by the video camera.
In the M.I.T. Media Laboratory Perceptual Computing Section Technical Report No 317, entitled "An Automatic System for Model-Based Coding of Faces" a compact representation of the face is described. In this system a parametric image model of the face is abstracted by recognizing features from a video image of the face. This parametric image model requires much less bandwidth than the original video image. However, they report that this parametric image model can be extracted only be when head tilts with respect to the video camera are limited to less than 15 degrees.
Current video telephones are further limited by a difficulty in establishing eye-to-eye contact. In most video telephones the camera is to the side or top of the display. Thus, the user can look directly at the camera or at the display, but not at both simultaneously.
The known devices do not satisfy all of the current requirements for a video telephone. There is the need for a video telephone with a video camera which can maintain an orientation and focus on a moving user. There is a need for a video telephone which can make effective use of the available bandwidth while remaining simple and compact.