The present invention provides methods and systems for compression of digital images (still or motion sequences) wherein predetermined criteria may be used to identify a plurality of areas of interest in the image, and each area of interest is encoded with a corresponding quality level (Q-factor). In particular, the predetermined criteria may be derived from measurements of where a viewing audience is focusing their gaze (area of interest). Portions of the image outside of the areas of interest are encoded at a lower quality factor and bit rate. The result is higher compression ratios without adversely affecting a viewer's perception of the overall quality of the image.
The invention is an improvement to the common practice of encoding, compressing, and transmitting digital image data files. Due to the large size of the data files required to produce a high quality representation of a digitally sampled image, it is common practice to apply various forms of compression to the data file in an attempt to reduce the size of the data file without adversely affecting the perceived image quality.
Various well-known techniques and standards have evolved to address this need. Representative of these techniques is the JPEG standard for image encoding. Similar to JPEG, but with the addition of inter-frame encoding to take advantage of the similarity of consecutive frames in a motion sequence is the MPEG standard. Other standards and proprietary systems have been developed based on wavelet transforms.
These prior art techniques all transform the image samples into the frequency domain and then quantize and/or truncate the number of bits used to sample the higher frequency components. This step is typically followed by entropy encoding of the frequency coefficients. MPEG and JPEG use a discrete cosine transform on 8×8 pixel blocks to transform the image samples into the frequency domain while wavelet techniques use more sophisticated methods on larger areas of pixels.
The loss of information is introduced at the quantization or truncation step. All of the other steps are reversible without loss of information. The degree of quantization and truncation is controlled by the encoding system to produce the desired data compression ratio. Although the method of controlling the quantization and truncation varies from system to system, the concept is generalized by those working in the field to that of a quality, or “Q” factor. The Q factor is representative of the resulting fidelity or quality of the image that remains after this step.
In the JPEG standard, control of the Q factor is set almost directly by the user at the time of encoding. In most encoders, it is global to the entire image. An image encoded using a standard JPEG encoder will result in degradation which is uniform over the entire image. Regardless of the importance of a particular part of an image to a viewer, the JPEG encoder simply truncates the higher frequency coefficients to produce a smaller file size at the expense of image fidelity. Prior art JPEG image compression makes no provisions to include high level cognitive information in the compression process.
In the MPEG standard, the Q factor is controlled indirectly by the bit-rate control mechanism of the encoder. The user (or system requirements such as the bandwidth of a DVD player or Satellite channel) typically set the maximum bit rate. Due to the complex interaction of the inter-frame encoding and the hard to predict relationship between the Q factor used during compression and the resulting data file size, the bit rate control is typically implemented as a feed-back mechanism. As the bit rate budget for a sequence of frames starts to run low, a global Q factor is decreased, and conversely if the bit rate is under budget, the Q factor is increased.
The MPEG standard also makes provisions for block-by-block Q factor control. Typically this level of control is accomplished by a measurement of the “activity” level contained in the block. Blocks with more “activity” are encoded with higher Q factors. The activity level is usually a simple weighted average of some important frequency coefficients, or based on the difference (motion) from the previous frame in that portion of the image.
Wavelet system standards are just starting to emerge. Some of these standards make provisions for varying Q factors over the area of the image.
These prior art systems attempt to preserve the image data content according to those portions most important to the human visual system (or a simplified model of it). Such prior art systems typically have no ability to make higher level decisions based on image content such as recognizable objects and features.
Some research in higher level image content recognition has been undertaken. Systems have been demonstrated that are able to identify specific objects in a scene, and for example, recognize faces. The prior art in these areas, however, does not describe using this information to control compression.
Certain prior art systems provide for a viewer determined area of interest. For example, Lewis U.S. Pat. No. 4,028,725 provides a vision system where the resolution of the display is increased in the viewer's line of sight. Hori U.S. Pat. No. 5,909,240 describes block compression of a video image performed during recording of the image based on the camera operator's viewpoint, which is determined using an eye tracking device associated with the recording device. Weiman et al. U.S. Pat. No. 5,103,306 discloses a system of image encoding with variable resolution centered around a point responsive to a single viewer's eye gaze.
In all such prior art, the area of interest is limited to one area designated by one viewer. This works fine for the one viewer actually viewing the image, but other viewers, or even the same viewer re-watching the recorded scene may not always direct their viewpoint to the same single location.
In general, the prior art does not describe or suggest a system of image compression based on the ability to predict or determine multiple areas of interest and encode the areas of interest at a higher Q-factor. It would be advantageous to provide a system whereby encoding is based on area of interest classification using predetermined criteria such that higher Q-factors are assigned to the areas of interest. It would be further advantageous to provide a system whereby the predetermined criteria may be based on measurements of a viewing audience's eye gaze.
Of significant importance in being able to effectively include high quality image content that anticipates the variety of viewpoints various viewers may choose is the ability to determine multiple areas of interest and encode and compress the areas of interest at high quality, while improving the compression ratio. Corresponding methods and systems are provided.