With the recent proliferation of inexpensive, powerful computer technology, methods of communication have progressed significantly. The ordinary voice telephone call, an efficient communication technique, is now accompanied by efficient and widely-used alternatives such as electronic mail and on-line chat rooms which allow participants to convey text, images and other data to each other over computer networks.
Video conferencing is another technique for communication which allows participants to convey both sound and video in real time to each other over computer networks. Video conferencing has, in the past, been cost prohibitive for individuals and corporations to put into wide-spread use. Recently, however, technology has progressed such that video conferencing technology is available, at a reasonable cost, for implementation at terminals such as a desktop or portable computer or hand-held communications device.
Video-conferencing terminals are typically equipped with a video camera and a microphone for respectively capturing, in real-time, video images and sound from participants of the video-conference. The terminals also typically include a display and a speaker for playing the video images and sound in real time to the participants. When a video conference has two participants, it is called a point-to-point conference. Typically, in this arrangement, each terminal will capture video and sound from the participant stationed at the terminal and will transmit the captured video and audio streams to the other terminal. Each terminal will also play the video and audio streams received from the other terminal on the display and speakers respectively of the terminal.
When a video conference has more than two participants, it is called a multi-point videoconference. Typically, in this arrangement, each terminal will capture video and sound from the participant stationed at the terminal. Subsequently, the captured video and audio streams will be transmitted either directly or indirectly to the other terminals. Each terminal will then display one or more video streams and play the audio streams from the other participants.
There are several problems to confront in multi-point video conferencing. The first is how to allocate the limited area of a terminal's display screen to each of several video streams. There are different ways of doing this. One way is to allocate a fixed area on the display screen for video and divide this area between the video streams from two or more conference participants. This technique of dividing a fixed area, also called "mixing" of video streams, unfortunately results in reduced resolution of the displayed images within each video stream. This problem is particularly acute when a terminal has only a small display area to begin with, such as when the terminal is a hand-held communications device.
Another way to allocate area on the display screen is to allocate a fixed size viewing area to the video stream from each participant. Using this technique, in a video conference involving four participants, the display of each terminal would include three fixed-size areas, each fixed-size area being devoted to one of the participants. The problem with multiple, fixed-size viewing areas, however, is that the area required for a particular number of participants may exceed that which is available on the display screen.
The above problems may be characterized as display screen "real-estate" problems. Still another technique for solving the display screen "real-estate" problem involves providing a participant with the ability to manually turn off certain video streams. This technique has the disadvantage of requiring manual intervention by the conference participant.
Additional problems to confront in multi-point video-conferencing concern the large volume of video and sound data which must be processed and transmitted between the terminals. Terminals are typically coupled together over packet switched networks, such as a local area network (LAN), a wide area network (WAN) or the Internet. Packet switched networks have limited amounts of bandwidth available. The available bandwidth may quickly be exceeded by the video and audio stream data produced by participants in a multi-point video conference.
Moreover, once the video and audio streams arrive at a terminal, the terminal must process the data prior to playing it on the display and speaker. Processing multiple video streams by "mixing" the streams or by allocating a fixed area to each video stream is demanding of the terminal's processing capability. The processing capability of a terminal may quickly be exceeded by having to process more than one video stream for display. In this event, the video and audio streams may become distorted or cease to be played by the terminal.
There is a need for an automatic mechanism to control the transmission and display of video-conferencing data. The automatic mechanism should select meaningful video streams for transmission and display to the other terminals. By the same token, the automatic mechanism should throttle-back video streams that do not contain meaningful content so that these video streams need not be transmitted and processed.