An image sequence, such as a video image sequence, typically includes a sequence of image frames or pictures. The reproduction of video containing moving objects typically requires a frame speed of thirty image frames per second, with each frame possibly containing in excess of a megabyte of information. Consequently, transmitting or storing such image sequences requires a large amount of either transmission bandwidth or storage capacity. To reduce the necessary transmission bandwidth or storage capacity, the frame sequence undergoes image processing, e.g., compression, such that redundant information within the sequence is not stored or transmitted. Television, video conferencing and CD-ROM archiving are examples of applications, which can benefit from efficient video sequence encoding.
Additionally, in an image processing environment where processing resources are limited or constrained by the requirements of a particular application, it is necessary to carefully allocate the available resources. Namely, although many powerful image processing methods are available, some applications may not have the processing resources or a stringent requirement in low latency may exist such that more powerful image processing methods are not practical or must be sparingly and selectively applied to meet application requirements.
For example, in real-time application such as videophone or video conferencing, the talking person's face is typically one of the most important part of an image sequence. The ability to detect and exploit such regions of importance will greatly enhance an encoding system.
For example, the encoding system in a low bitrate application (e.g., real-time application) must efficiently allocate limited bits to address various demands, i.e., allocating bits to code motion information, allocating bits to code texture information, allocating bits to code shape information, allocating bits to code header information and so on. At times, it may be necessary to allocate available bits such that one parameter will benefit at the expense of another parameter, i.e., spending more bits to provide accurate motion information at the expense of spending less bits to provide texture information. Without information as to which regions in a current frame are particularly important, i.e., deserving of more bits from a limited bit pool, the encoder may not allocate the available bits in the most efficient manner.
Furthermore, although the encoder may have additional resources to dedicate to identified regions of importance, it is often still unable to improve these regions beyond the quality of the existing input image sequence. Namely, changing the encoding parameters of the encoder cannot increase the quality of the regions of importance beyond what is presented to the encoder.
Therefore, there is a need in the art for an apparatus and a concomitant method for classifying regions of interest in an image, based on the relative “importance” of the various areas and to adaptively use the importance information to allocate processing resources and to control manipulation of the input image sequence prior to encoding.