Video teleconferencing systems enable two or more parties to participate in remote conversation with accompanying near real-time video. For an understanding of a conventional video teleconferencing system, please refer now to FIG. 1. As shown in FIG. 1, in a conventional video teleconferencing system, a local participant communicates by means of a teleconferencing station 10. For processing the video portion of the conversation, each teleconferencing station includes a display device 12, i.e., a monitor, for displaying the other party/parties to the conversation, and a video camera 14 for transmitting video to the other parties to the conversation. The video camera 14 is typically mounted just above the display 12.
One of the problems associated with video teleconferencing is the lack of eye contact between participants. Typically, participants interact with the display for communicative purposes instead of the recording camera that is positioned to capture a video image of the local participant. Interest of the local user is focused primarily on the display for communication. Users typically interact with images of participants on the display by talking to and gazing at the participant located on the display.
Since the video camera can not physically be positioned exactly at the location of interest of the local participant, (i.e. the center of the display 12) the remote participant will not see a face-on view of the local participant. The local user appears to be avoiding eye contact by gazing off in another direction. Moreover, the same problem exists at the display of the local user since the local user also views a video stream of the remote participant that is not face-on.
One approach that has been explored in solving the problem of the lack of eye contact is to display a model of the facial features of a participant in a static synthetic environment. In this way, a model of the participant is created that is face-on. The model is created by substituting various facial features of the participant for the actual state of the facial feature. The facial features that are substituted can be created from sample images of the participant. For example, a mouth can be in various emotional states i.e. a smile, a pursed lip, etc. As the participant smiles and frowns in real time, a model is generated to reflect the smiles and frowns and is presented to the other participants.
However, the limitation of this approach is evident especially when errors in interpreting the visual features of the participant are displayed. For example, if a mouth of a participant is actually smiling but a pursed lip is detected, then the model of the participant would display the pursed lip. Although the accompanying audio stream and other facial features may not correspond to the errant display of a pursed lip, the model would not appear to be in error, since all the facial features are reconstructed from actual images. A viewer of the model therefore would be unable to determine if an error occurred.
Moreover, because of current limitations of computing resources, in order for real-time display of the model to occur for real-time video teleconferencing, for simplicity only a few features are reconstructed for exhibiting emotion. As such, the model may not appear realistic to the viewing participant. Also, since the model is reconstructed from sample images, the actual image of the participant is not transmitted to the viewing participant. As such, subtleties in emotion exhibited by the facial features cannot be represented by a model if the sample images do not contain emotion, or if the reconstructed facial features do not encompass the facial feature of interest.
Accordingly, what is needed is a video teleconferencing apparatus that addresses the shortcomings of the existing technology. The apparatus should be simple, cost effective and capable of being easily adapted to existing technology. The present invention addresses these needs.