1. Field of the Invention
The present invention relates to a low bit-rate communication system for multimedia applications, such as a video teleconferencing system, and more particularly, to a method of, and system for, filtering video images prior to video coding.
2. Description of the Related Art
The storage and transmission of full-color, full-motion images is increasingly in demand. These images are used not only for entertainment, as in motion picture or television productions, but also for analytical and diagnostic tasks such as engineering analysis and medical imaging.
There are several advantages to providing these images in digital form. For example, digital images are more susceptible to enhancement and manipulation. Also, digital video images can be regenerated accurately over several generations with only minimal signal degradation.
On the other hand, digital video requires significant memory capacity for storage and equivalently, it requires a high-bandwidth channel for transmission. For example, a single 512 by 512 pixel gray-scale image with 256 gray levels requires more than 256,000 bytes of storage. A full color image requires nearly 800,000 bytes. Natural-looking motion requires that images be updated at least 30 times per second. A transmission channel for natural-looking full color moving images must therefore accommodate approximately 190 million bits per second. However, modem digital communication applications, including videophones, set-top-boxes for video-on-demand, and video teleconferencing systems have transmission channels with bandwidth limitations, so that the number of bits available for transmitting video image information is less than 190 million bits per second.
As a result, a number of image compression techniques such as, for example, discrete cosine transformation (DCT) have been used to reduce the information capacity (bytes) required for the storage and transmission of digital video signals. These techniques generally take advantage of the considerable redundancy in the image information for any natural image, so as to reduce the amount of information used to transmit, record, and reproduce the digital video images. For example, when an object in a video sequence has many pixels that are identical, the discrete cosine transformation (DCT) technique uses zero data components to identify such redundant pixels, so as to eliminate the image information (bytes) that is associated with such pixels from being compressed and transmitted. In contrast, the discrete cosine transformation technique uses non-zero data components to identify pixels that are not identical. Thus, if the video image to be transmitted is an image of the sky on a clear day, the discrete cosine transform (DCT) image data information has many zero data components, since there is little or no variation in the objects depicted for such an image. As a result, the image information of the sky on a clear day is compressed by transmitting only the small number of non-zero data components used to identify pixels that are not identical.
One problem associated with image compression techniques, such as discrete cosine transformation (DCT) is that for low bitrate applications, DCT techniques tend to produce decoded images disturbed by errors. One type of commonly occurring error is referred to as xe2x80x9cmosquito noisexe2x80x9d, since its appearance in a decoded video segment gives the illusion of xe2x80x9cmosquitoesxe2x80x9d closely surrounding an object. xe2x80x9cMosquito noisexe2x80x9d is typically present at the edges and contours of objects in a video sequence. Noise, such as xe2x80x9cmosquito noisexe2x80x9d, occurs when there is too much image information to be coded in the video sequences, than there are bits available for transmission. For example, the DCT data information for edges and contours of objects depicted in a video sequence have many non-zero data components, since such areas contain few redundancies across the image. If the number of non-zero data components for the edges and contours of an object is larger than the number of bits available for transmission, such as, for example, with low bitrate telephony systems, some of the data components for the object are not coded accurately. As a result, some of the inaccurately coded data components get translated into images containing xe2x80x9cmosquito noisexe2x80x9d, when the transmitted data components are decoded.
Typically, video image errors such as xe2x80x9cmosquito noisexe2x80x9d, are reduced with the use of a prefilter which filters the video signal prior to image compression. The prefilter is characterized with a defined range of frequencies. When a video signal is input to the prefilter, only those frequencies contained in the video signal that are within the range of defined frequencies for the prefilter, are output to a video coder. The frequencies contained in the video signal that are outside the range of defined frequencies for the prefilter are suppressed. Thus, prefiltering essentially eliminates some of the image information from the video sequences, allowing the sequences to be coded using fewer bits, so as to reduce coding errors, such as, xe2x80x9cmosquito noisexe2x80x9d.
Unfortunately, since the prefilter filters the video signal using a single, fixed range of frequencies, image information is uniformly eliminated from areas deemed important to the content of the video sequence as well as from areas that are deemed unimportant. For example, when a video sequence containing facial areas is prefiltered using a single, fixed range of frequencies, those frequencies of the facial areas that are outside of the range of defined frequencies for the prefilter, are suppressed. As a result, facial areas are often depicted with overly smoothed out features, giving the faces an artificial quality, since fine features such as wrinkles that are present on faces found in the original video sequence tend to be filtered out. Image areas of a video sequence that are typically deemed important include edges of objects, skin areas, facial regions and areas surrounding a moving object. Image areas of a video sequence that are deemed less important include the background of the image. Although, prefiltering of the video signal reduces the errors attributable to: coding, the use of a fixed range of filtering frequencies defines a single strength filter, which automatically eliminates image information from some important areas of the video sequence, resulting in an overall loss of picture quality when the original image is compared to a decoded version of the same image.
Accordingly, prefiltering arrangements that reduce the number of bits to be coded using image compression techniques continue to be sought.
The present invention is directed to an object-oriented filter for filtering video images prior to video coding and, in an illustrative application, is used in conjunction with the video coder of video encoding/decoding (Codec) equipment. The object-oriented filter initially analyzes an input video signal to map the location of picture elements (pixels) that are associated with one or more image parameters contained in the video sequences. The term image parameter as used herein refers to a parameter that is associated with a certain aspect of the video sequence. Examples of image parameters include the edges of objects, skin areas, eyes-nose-mouth (ENM) regions, and the areas surrounding moving objects.
Once the pixels associated with one or more of the image parameters are mapped, the object-oriented filter selects a filtering factor for each identified image parameter of the video sequence. The filtering factor adjusts the strength of the filter that subsequently filters those pixels associated with each identified image parameter. Thus, in a single frame of a video sequence the pixels associated with one or more image parameters are first mapped and then filtered by the object-oriented filter of the present invention using several differing filter strengths in contrast to the single filter strength utilized by filters of the prior art to filter all pixels of a video sequence.
The filtering factors are selected by ranking the identified image parameters in order of their importance to the overall video sequence. For example, if a video sequence depicts two persons walking across a field, the two persons walking would be deemed more important to the content of the video sequence than the field, since a viewer will tend to focus his or her attention toward a specific object contained in the video sequence (i.e., the two persons walking) instead of toward background scenery (i.e., the field).
The mapping of the location of pixels associated with one or more image parameters contained in a video sequence and the selection of a filtering factor that adjusts the strength of the filter which subsequently filters such pixels is advantageous. This is because the pixels are filtered according to their relative importance in the video sequence.
In the present illustrative example, the object-oriented filter is integrated with, but functions independently of, the other component parts of the video coding/decoding (Codec) equipment which includes an encoder, a decoder, and a coding controller. In one embodiment, the object-oriented filter is inserted between the input video signal and the encoder, to prefilter the input video signal prior to the encoding of the video images.
In one example of the present invention, the object-oriented filter includes an image parameter extractor, a filter selector and a filter. The image parameter extractor analyzes the input video sequences and maps the locations of pixels associated with at least one image parameter. The image parameter extractor is advantageously programmed to identify the previously named image parameters that are likely to be deemed important to the content of the video sequence.
Once the image parameter extractor has mapped the locations of pixels associated with the selected image parameters, the filter selector determines the relative importance of each image parameter to the overall video sequence. The filter selector identifies the relative importance of each image parameter based on a predetermined association such as a hierarchy. For example, if eyes-nose-mouth (ENM) regions, edge areas and skin regions are mapped by the image parameter extractor, than the filter selector is programmable to rank ENM regions as more important than edge areas, and rank edge areas as more important than skin regions.
After the identified image parameters are ranked according to their relative importance, a filtering factor for adjusting the strength of the filter used to filter the pixels associated with the image parameters, is selected by the filter selector. The filter selector advantageously selects a filtering factor to apply the weakest filter strength to pixels associated with image parameters that are deemed important to the content of the video sequence. The filter strength, as used herein, is the range of frequencies for the filter. A weak filter strength implies that a wide range of frequencies pass through the filter, so that very little image information (bytes) gets eliminated from pixels that are filtered using such a filter. In contrast, a strong filter strength implies that a narrow range of frequencies pass through the filter. Consequently, more image information (bytes) passes through a filter having a weak strength than a filter having a strong strength.
Once a filter factor is selected, the pixels associated with the identified image parameters are filtered by the filter. The filter is advantageously a separable filter and has a form which is a function of the filter factor, such as, for example, 
where N is the filtering factor. For such a filter, a stronger filter strength corresponds to a smaller value of N.
The object-oriented filter of the present invention thereafter analyzes subsequent frames of the video sequence to identify image parameters and select the filtering factors useful for filtering such video frames. The object-oriented filter optionally uses the filtering factors selected for one frame of a video sequence to filter subsequent frames of the video sequence.
Other objects and features of the present invention will become apparent from the following detailed description considered in conjunction with the accompanying drawings. It is to be understood, however, that the drawings are designed solely for purposes of illustration and not as a definition of the limits of the invention, for which reference should be made to the appended claims.