Video conferencing systems 10a,b, such as shown in FIG. 1, are used for example to connect corporate meeting rooms or other sites by a real (or quasi real) time video link 14, allowing conference participants in two or more sites to in effect meet and see one another without the necessity of traveling. The link 14 between the sites may be accomplished through many known media, such as through phone lines or other wiring, or by a wireless link, and may additionally contain audio information indicative of the conversations being had by the conference participants. The data transmitted over link 14 generally comprises digitized information indicative of the images (e.g., 18) present at the sites, and is displayable on a monitor 16 associated at least with the receiving videoconference system 10b. (Only a one-way video communication link is illustrated in FIG. 1 for simplicity, although one skilled in the art will understand that link 14 is typically a two-way link, and that each videoconference system 10a,b is typically identical in its makeup to allow for two-way communication).
There is much to be gained in the efficient initial processing, or encoding, of the images at a site prior to their transfer to the decoders at other sites. For example, an image 18 being broadcast to another site will generally have moving (e.g., people) and non-moving (e.g., tables, wall coverings, etc.) components. To reduce the burden of transferring the image, such non-moving portions of the image are preferably not recoded by the encoder 20 of the sending system. In other words, for those portions of the image that are deemed to be non-moving or not changing by the encoder and have been previously encoded and sent to the decoder with sufficient quality, the encoder will simply inform the receiving decoder 22 to reconstruct the same portion of the image from the previously decoded image, rather than resending image data for that portion.
The digitization of the image formed by the optics of camera 24 causes distortion in the digitized image data, referred to as image noise. As each subsequent image is captured, and because this distortion is temporal in nature, image data can change even when there is no change in the image 18. However, even with the use of state of the art noise reduction filters, some noise can remain in the image. Accordingly, it can be difficult for the encoder 20 to digitally determine if portions of the image 24 are moving and/or changing, or whether the perceived change or motion is merely due to noise. This problem can be addressed by the encoder 20 by assessing a threshold which determines the magnitude of change in a portion of the image. Thus, if the magnitude of change for a portion of the image is lower than some acceptable coding threshold (T1), the encoder 20 concludes that the image portion is non-moving and will transmit such a conclusion to the receiving decoder(s) without recoding that portion of the image. If the degree of change is greater than T1, the encoder similarly concludes that the image portion is moving or changing, and more detailed processing and recoding of that image portion is performed prior to transfer.
Choosing the appropriate coding threshold T1 can be a difficult task, and can lead to false conclusions. For example, if T1 is set too low, the encoder 20 can draw the erroneous conclusion that the image portion is moving and/or changing, which can causes the image portion to be needlessly recoded and rebroadcast. This can lead to an image which appears on the monitor 16 at the receiving end to move or “swim,” which is visually annoying and not desirable. Moreover, the erroneous conclusion causes the unnecessary recoding and transfer of image data which constrains the limited bandwidth of link 14. By contrast, if T1 is set too high, the encoder may erroneously conclude that the image portion is not moving or not changing, when in fact it is moving or changing, leading to a similarly undesirable effect visual effect at the receiving end because moving or changing areas will erroneously appear to be static.
In addition to applying a threshold, the encoder could improve the noise detection algorithm by checking for noise only if the encoder has determined that the particular image portion will be coded using a [0,0] motion vector (i.e., if the encoder concludes that the portion is not moving), an approach that has been used in the prior art. However, this scheme suffers from potentially incorrectly deciding that a portion is non-moving when in fact it is changing to some degree, and therefore: not coding that portion at all based on the threshold. If this occurs, actually changing portions are not updated by the encoder, and hence appear static at the display 16 on the receiving end.
Accordingly, the videoconference art would benefit from a more efficient scheme for encoding image data prior to its transfer, and would particularly benefit from an improved scheme for distinguishing between portions of an image which are moving and/or changing and portions of the image which are static or stationary.