Videoconferencing technology comprises a set of interactive telecommunication technologies that allow two or more parties of people at remote locations to interact simultaneously through two-way video and audio transmissions. Videoconferencing is used not only to provide audio and video transmission of meeting activities and people, but is also to share documents, computer-displayed information, demonstrations, performances, and the like. To reduce the amount of data transmitted in video systems, the data is often compressed through a coding scheme. For IP (Internet Protocol) based videoconferencing systems, the key components include the codecs (coder/decoder) that enable the digital compression of audio and video data streams in real time.
A video codec is a device or firmware/software program that compresses and/or decompresses the digital video data transmitted between a transmitter and receiver. For videoconferencing, and similar applications, specially developed hardware or software-based codec (coder/decoder) have provided compression rates of up to 1:500. The compression schemes for codecs usually employ lossy data compression in which a much smaller compressed file is produced compared to a lossless compression method.
In lossy transform codecs, samples of picture or sound are taken, chopped into small segments, transformed into a new basis space, and quantized. The resulting quantized values are then entropy coded. In lossy predictive codecs, previous and/or subsequent decoded data is used to predict the current sound sample or image frame. The error between the predicted data and the real data, together with any extra information needed to reproduce the prediction, is then quantized and coded. Lossy methods are most often used for compressing sound, images or videos, but lossless compression is typically required for text. Lossless compression is used when it is important that the original and the decompressed data be identical, or when no assumption can be made on whether certain deviation is uncritical.
A videoconferencing environment is generally characterized by a relatively static background scene (e.g., a conference room) with a specific focused area of activity or object of interest, such as a particular person or a demonstration. Many other video environments exhibit similar characteristics, such as video surveillance systems, video security/monitoring, webcam setups, and the like, in which a specific event or event type is to be detected, or a specific object is to be focused on. Such systems are also typified by the use of equipment that may have varying levels of quality, such as cameras, modems, routers, playback devices, application software, and so on. The transmission link between the sites may also be quite varied, from high speed network links (e.g., T1, ISDN, etc.) to low bandwidth transmission links (e.g., analog telephone or POTS). Because of these variables, the compression of the audio and video data must be optimized to ensure the highest quality of data transmission possible.
Videoconferencing codecs thus have several difficult requirements that must be satisfied all at once. They must have a very low latency (high speed) and have a very low bitrate to fit into a small bandwidth and to accommodate short latencies of the transmission system. Because of these prime constraints, the quality of the video is usually quite low when there is fast motion, even though this may happen relatively rarely in a typical video conference setting.
In a typical videoconference session, the key area of focus is a person's face. Present videoconferencing systems or similar systems used for other applications generally do not optimize the compression method for a particular areas of focus, especially subject faces. For example, most videoconferencing systems work relatively well as long as there is relatively little movement within a scene. As soon as a person or other object moves, however, the images often become quite blocky (pixelation effects) or exhibit other compression-related deficiencies, such as blocking and ringing. In this case, focus may be lost in certain crucial areas, since all areas are treated the same in the compression algorithm. Such systems do not adequately isolate particular areas of interest within a scene in a manner that maintains a high quality transmission for those areas.
What is desired, therefore, is a videoconferencing, or similar application system that optimizes compression for faces or other focused regions of interest within a greater scene.