Video images have become an increasingly important part of communications in general. An ability to nearly instantaneously transmit still images, and particularly, live moving images, has greatly enhanced global communications.
In particular, videoconferencing systems have become an increasingly important business communication tool. These systems facilitate meetings between persons or groups of persons situated remotely from each other, thus eliminating or substantially reducing the need for expensive and time-consuming business travel. Since videoconference participants are able to see facial expressions and gestures of remote participants, richer and more natural communication is engendered. In addition, videoconferencing allows sharing of visual information, such as photographs, charts, and figures, and may be integrated with personal computer applications to produce sophisticated multimedia presentations.
To provide cost-effective video communication, bandwidth required to convey video must be limited. A typical bandwidth used for videoconferencing lies in the range of 128 to 1920 kilobits per second (Kbps). Problems associated with available videoconferencing systems in an attempt to cope with bandwidth limitations include slow frame rates resulting in a non-lifelike picture having an erratic, jerky motion, the use of small video frames or limited spatial resolution of a transmitted video frame, and a reduction in the signal-to-noise ratio of individual video frames. Conventionally, if one or more of these effects is undesirable, then higher bandwidths are required.
At 768 Kbps, digital videoconferencing, using state-of-the-art video encoding methods, produces a picture that may be likened to a scene from analog television. Typically, for most viewers, twenty-four frames per second (fps) are required to make video frames look fluid and give the impression that motion is continuous. As the frame rate is reduced below twenty-four fps, an erratic motion results. In addition, there is always a tradeoff between a video frame size required and available network capacity. Therefore, lower bandwidth requires a lower frame rate and/or reduced video frame size.
A standard video format used in videoconferencing, defined by resolution, is Common Intermediate Format (CIF). The primary CIF format is also known as Full CIF or FCIF. The International Telecommunications Union (ITU), based in Geneva, Switzerland (www.itu.ch), has established this communications standard. Additional standards with resolutions higher and lower than CIF have also been established. Resolution and bit rate requirements for various formats are shown in Table 1. Bit rates (in megabits per second, Mbps) shown are for uncompressed color frames where 12 bits per pixel is assumed.
Video compression is a way of encoding digital video to take up less storage space and reduce required transmission bandwidth. Certain compression/decompression (CODEC) schemes are frequently used to compress video frames to reduce the required transmission bit rates. Overall, CODEC hardware or software compresses digital video into a smaller binary format than required by the original (i.e., uncompressed) digital video format. As can be noted from Table 1, there is an extraordinarily large number of bits (e.g., nearly 584 million bits each second in a 16CIF format), and consequently, a tremendous amount of processing of the bits that must occur for effective video processing and motion estimation. Consequently, an ever-increasing application for microprocessors is CODEC processing.
TABLE 1Resolution and bit-rates for various CIF formatsResolutionBit Rate at 30 fpsCIF Format(in pixels)(Mbps)SQCIF (Sub Quarter CIF)128 × 96 4.4QCIF (Quarter CIF)176 × 1449.1CIF (Full CIF, FCIF)352 × 28836.54CIF (4 × CIF)704 × 576146.016CIF (16 × CIF)1408 × 1152583.9
Motion estimation algorithms are a significant part of the CODEC processing. A sum-of-absolute-differences (SAD) operation is frequently the cornerstone of most motion estimation algorithms. Based on the amount of processing required for every frame in a video image, the SAD operation is extremely computationally intensive. Therefore, the operation must be performed as quickly and efficiently as possible.
A governing equation of the SAD operation:
      SAD    ⁡          (              U        ,        V            )        =            ∑              x        =        0            15        ⁢                  ∑                  y          =          0                15            ⁢                                            U            ⁡                          (                              x                ,                y                            )                                -                      V            ⁡                          (                              x                ,                y                            )                                                  where U and V are image frames and x and y are 2-dimensional spatial coordinates. Despite its apparent innocuous and simple form, the SAD governing equation in its current formulation, when coupled with the tremendously high number of bits requiring processing, is extremely computationally intensive and thus places a limit on a temporal speed of motion estimation algorithms. Adding additional hardware (e.g., multiple processors, additional memory, additional registers) increases a speed of the computation but at a sacrifice of geometrical space considerations and cost-to-implement.
Therefore, although the SAD operation and other compression methods have proven somewhat effective, there remains a need to improve video quality over low bandwidth transmission channels while not significantly increasing space for hardware performing the calculations or a cost of the implementing hardware.