Various video conferencing techniques have been utilized in the past to provide full frame video of participants for the video conference, most usually with each of the participants represented at a window on the terminal or display device utilized. What all of the prior video conference systems have in common is that it is necessary to provide full frame video which is bandwidth intensive and, inter alia, does not permit the participants to alter their image which is displayed to other participants. One such video conferencing system by K. Hiratsuka and H. Kakihara entitled Video Conferencing System Japan Telecommunications Review, Vol. 18, No. 3, pp. 145-151, July 1976, utilizes full frame video and accommodates large numbers of conferees by dividing up each scene into different segments with each segment filmed by different cameras. The purpose of dividing up the scene into segments is to be able to portray conferees large enough to be recognized, noting that if all the conferees were pictured, the faces would be too small to be recognized. A similar system is illustrated in U.S. Pat. No. 3,601,530 issued to Robert C. Edson et al., illustrating a video conferencing system using voice switched cameras, with the cameras dividing up the scene. Note that a video conferencing system corresponding to the aforementioned Japanese system is also published in the Proceedings of the IEEE, Vol. 73, No. 4, April 1985, authored by Hakar Sabri and Birendra Prasada.
As illustrated in U.S. Pat. No. 4,004,084 issued to Earl Franklin Brown et al., a video conferencing system utilizes spatial reduction and temporal resolution to minimize bandwidth requirements for full-frame video utilizing speech to select which picture is automatically transmitted. Here only one picture is displayed at a time.
British Patent 1,173,918 illustrates the utilization of a TV screen to present full frame video corresponding to the various participants in separate parts of the TV screen.
As illustrated in U.S. Pat. No. 5,157,491, modular screen approaches have been utilized in video teleconferencing, again to provide a full likeness of the individual at various modules. Additionally, U.S. Pat. No. 4,965,819 to Dino Kannes describes a video conferencing system for courtroom applications in which full-framed video likenesses of participants are presented at various portions of the screen.
Of particular interest is U.S. Pat. No. 4,400,724 which illustrates a virtual space teleconferencing system by Craig L. Fields, in which video cameras are networked in which images of conferees are displayed at corresponding positions. Positions of the conferees are detected by an overhead video camera and the image signals are combined for display on the screens on tabletops at each station. Images from a video storage device are also combined in the composite signal. The display devices and cameras at each station are oriented with respect to the virtual conference space such that the images from each station will appear on the screen as oriented the same as the conferee at that station.
In order to limit bandwidth and direct attention to a participant who is speaking, a speaking participant in the conference is isolated and filmed in one system described in U.S. Pat. No. 5,206,721 through the use of a swiveled camera pointed at the speaking participant. A like directed patent is U.S. Pat. No. 4,996,592 in which the field of view of video framing is electronically adjusted to isolate the speaking participant.
Other U.S. Patents relating to video conferencing include U.S. Pat. Nos. 5,195,086; 5,187,571; 5,061,992; 5,027,198; 5,003,532; 4,995,071; 4,935,953; 4,931,872; 4,890,314; 4,882,743; 4,710,917; 4,650,929; 4,574,374; 4,529,840; 4,516,156; 4/054,906; 3,775,563 and 3,725,587. Also related is U.K. Patent Application No. 2,238,681 dated May 6, 1991.
What will be appreciated from the above is that video conferencing has relied on full frame video to represent the participants.
The problem with transmitting video, especially in network applications, is the large bandwidth required. Moreover, information contained in the full frame video is not necessarily that which each participant wishes to have on view to other participants. Additionally, full-frame video includes information extraneous to a meeting or game playing situation, which may be totally irrelevant to the purpose of the meeting or game. In fact, full frame video representations of an individual may be distracting and therefore unwanted. Disinterest or facial or hand gestures which an individual would not want represented to other participants are nonetheless transmitted without any control on the part of the participant. Thus it is not possible for the participant to control his persona, ie., his presentation of himself to other participants on the network. Moreover, there are no means for augmenting one's persona in any given direction should a participant want to be more emphatic than a straight filming of his actions would indicate. Likewise, if a participant is bored, but does not want to show it, there is no ready means for him to control that which is presented to the other participants.
In terms of the participant viewing his terminal, heretofore the person viewing the terminal and thus the conference was not included in the scene. Thus he could not view himself as he participated in the conference so that he could analyze his gestures or demeanor and adjust them if desired. Seeing one's self as portrayed to others is of course important if one has the opportunity to control what other participants see.
Also, in prior teleconferencing systems, there has been no ability for the individual to maneuver around the space presented to him on the terminal. Nor has there been a stereo spatial localization technique utilized to steer the apparent direction of the audio to the location of the speaking participant. Although attempts have been made to zoom in or focus in on a speaking individual by detecting a mike output, audio spatial location has not been utilized in video conferencing techniques.
However, the most important lack of all video conferencing techniques is the supposed requirement to send a complete likeness of the individual participant to other conferees or participants. This requires exceedingly high bandwidth, exceeding twisted pair capability, and in some instances exceeding even fiber optic capability to provide real time representations of the conference.