1. Field of the Invention
This invention relates to the field of video image processing and data communications and in particular to the field of video image encoding.
2. Description of Related Art
Video image encoding techniques are well known in the art. Encoding standards such as CCITT H.261, CCITT H.263, and MPEG provide methods and techniques for efficiently encoding sequences of video images. These standards exploit the temporal correlation of frames in a video sequence by using a motion-compensated prediction, and exploit the spatial correlation of the frames by using a frequency transformation, such as a Discrete Cosine Transformation (DCT). When an image is transformed using a frequency transformation, the resultant frequency component coefficients, the measures of energy at each frequency, are typically non-uniformly distributed about the frequency spectrum. According to the existing standards, the non-uniformly distributed coefficients are quantized, typically producing some non-zero quantized coefficients among many zero valued quantized coefficients. The occurrences of many zero valued coefficients, and similarly valued non-zero quantized coefficients, allows for an efficient encoding, using an entropy based encoding, such as a Huffman/run-length encoding.
The aforementioned quantizing process introduces some loss of quality, or precision, in the encoding. Consider, for example, the transformation of a very minor image detail that results in a very small frequency component in the transformation of the image. If the magnitude of that frequency component, or coefficient, is below the size of the quantization step size, the quantized coefficient corresponding to that very small transformation coefficient will be zero. When the corresponding encoded image is subsequently decoded, it will not contain the original very minor image detail, because the frequency component corresponding to this detail has been eliminated by the quantization step. In like manner, each frequency coefficient is xe2x80x9croundedxe2x80x9d to the value corresponding to the quantization step that includes the coefficient.
As is evident to one of ordinary skill in the art, the quantization step size determines the degree of loss of quality in the encoding process. A small quantization step size introduces less round-off error, or loss of precision, than a large quantization step size.
As is also evident to one of ordinary skill in the art, the quantization step size determines the resultant size of the entropy based encoding. A small quantization step size, for example, rounds fewer coefficients to a zero level than a large quantization step size, and therefore there will be fewer long runs of zero values that can be efficiently encoded.
A small quantization step size provides for a high quality reproduction of the original image, but at the cost of a larger sized encoding. A large quantization step size provides for a smaller sized encoding, but with a resultant loss of quality in the reproduction of the original image.
The variable sized encodings of an image are often communicated over a fixed bandwidth communications channel, such as, for example, a telephone line used for video teleconferencing, or a link to a web site containing video information. In such systems, the variable length encoded images are communicated to a buffer at the receiving site, decoded, and presented to the receiving display at a fixed image frame rate. That is, for example, in a video teleconferencing call, the sequence of images may be encoded at a rate of ten video frames per second. Because the encodings of each frame are of variable length, some frames may have an encoded length that require more than a tenth of a second to be communicated over the fixed bandwidth communications channel, while others require less than a tenth of a second. For optimal bandwidth utilization, the aggregate encoded frame transmission rate should equal the video frame rate. The receiving buffer size determines the degree of variability about this aggregate rate that can be tolerated without underflowing or overflowing the buffer. That is, if the receiving buffer underflows, a frame will not be available for display when the next period of the video frame rate occurs; if the receiving buffer overflows, the received encoding is lost, and the frame will not be displayable when the next period of the video frame rate occurs. In a conventional encoding system, the quantization step size is continually adjusted to assure that neither an overflow nor an underflow of the receiving buffer occurs. Because the receive buffer is of limited size, the quality of the encoding can become unacceptably poor, particularly when communicating via a low bandwidth communications path.
Techniques have been developed or proposed to allocate varying degrees of quality to different areas of an image, by providing different quantization step size at different regions of the image. That is, to optimize the use of available bandwidth, more bandwidth is allocated to areas of interest than to areas of less interest, by allocating a higher image quality potential to the areas of interest. U.S. Pat. No. 4,972,260, dated Nov. 20, 1990, incorporated by reference herein, provides a method of encoding that varies the quantization step size of each block in an image frame based on the location of the block in the frame; blocks in the center of the frame being assigned a smaller quantization step size, and therefore higher quality, than the blocks on the perimeter of the frame. Such a technique is based upon an assumption that the information of interest to the user will normally be centrally located on each frame. Although this assumption is commonly true, there are many situations wherein the location of an object in the scene is independent of the interest in the object. For example, videoconference scenes may include a table about which multiple participants are seated; the focus of interest will typically switch to whomever is speaking, regardless of where the speaker is located about the table.
Techniques have also been developed or proposed that analyze the image for particular features, such as areas of flesh tones, and apply a smaller quantization step size to these areas. U.S. Pat. No. 5,729,295, dated Mar. 17, 1998, incorporated by reference herein, enhances this technique by providing an encoding of an entire image, and thereafter selectively updating only the specifically identified areas and those other areas of the scene that exceed a particular motion threshold. As in the prior art, the specific areas, such as a facial area, are encoded using a smaller quantization step size than the motion areas; background areas that have slight or no movements are not encoded, thereby avoiding the encoding of xe2x80x9cnoisexe2x80x9d, such as moving leaves in a distance. Such a technique is based upon an assumption that areas of interest in the image have a distinguishable characteristic that can be used to identify the areas that are to be updated. Identifying the distinguishable characteristic in each block of each frame of a sequence of video images adds a substantial computational overhead to the encoding process. Additionally, the lack of updating of background blocks having only slight motion produces a stale and unrealistic looking background, and may result in visual anomalies, ignoring, for example, a slow but continual movement of an object across the scene.
It is an object of this invention to provide a method and apparatus for video encoding that provides an allocation of image quality that efficiently utilizes the available bandwidth of a communications channel. It is a further object of this invention to allocate the image quality without introducing visual disturbing effects or anomalies.
These objects and others are achieved by allocating image quality in dependence upon the relative speed of motion of objects in the image. Fast moving objects are allocated less quality, or precision, than slower moving or stationary objects. In a preferred embodiment of this invention, the quantization step size is dependent upon the magnitude of the motion vector associated with each block in each frame of a video sequence. In a further embodiment of this invention, the quantization step size is also dependent upon the location of each block in each frame, providing more precision to a central area of each frame. To reduce computational complexity, a motion activity map is created to identify areas of higher precision based upon the location and motion associated with each block. To further reduce computational complexity in a preferred embodiment, the sets of parameters for effecting the desired quality levels are predefined, and include, for example, an initial value and bounds for the quantizing factor that is used for encoding independent and predictive frames of the sequence of images. In a further preferred embodiment, the sets of parameters for effecting the desired quality levels are adjustable based upon a user""s preferences.