The present invention pertains to the field of data compression. In particular, the present invention pertains to a constant bitrate method for lossy image compression.
Image data (including video data) are acquired when a picture (or movie) is taken with a conventional camera and scanned, or when a picture (or movie) is captured with a digital camera. Image data are also acquired through the use of a three-dimensional rendering program (e.g., a computer graphics program) executed on a computer system.
Image data can comprise a significant amount of data. A single frame in a quality image may include an array of up to 4000 by 2000 pixels, each pixel described by several color values (for example, by Red, Green and Blue, or by one Luminance and two Chroma). Thus, a video running at, for example, 30 frames per second normally requires a tremendous amount of image data to be stored, retrieved from memory and processed. Obviously, this consumes a large portion of a computer system""s processing resources; specifically, it can take up a lot of costly hard disk space.
To address these problems, image data can be compressed to reduce the amount of data without significantly affecting the fidelity of the image. Various image compression schemes known in the art exist to accomplish this, such as the JPEG (Joint Photographic Experts Group) compression scheme or the MPEG (Motion Pictures Experts Group) compression scheme. These compression schemes work well to reduce the amount of image data. Even though these compression methods are lossy, usually the loss is not recognizable by the human visual system. Lossy image compression takes advantage of the inherent spatial redundancy of image data. Thus, for the same quantization settings, two different images can result in different bitstream sizes. Compression ratios up to 10:1 usually do not reveal noticeable artifacts for the human observer.
In video compressionxe2x80x94which is the art of compressing a sequence of image framesxe2x80x94the expression xe2x80x9cbitratexe2x80x9d (or simply xe2x80x9cratexe2x80x9d) is commonly used instead of the compression ratio. A bitrate has units of bits per time (usually bits per second). A bitrate implies a certain number of image frames per second (xe2x80x9cframe ratexe2x80x9d), from which the uncompressed size of all the frames within one second can be calculated. A compressed bitstream of one secondxe2x80x94a file that contains all the frames of one second interval in compressed formxe2x80x94has a certain size which is expressed as the bitrate. An algorithm that controls the size of this bitstream is called a rate control algorithm. However, a special case of a rate control algorithm is to compress only one frame. Therefore, compression ratio and rate can be used interchangeably.
Prior Art FIG. 1 shows some of the steps used in one embodiment of a compression scheme for compressing image data (e.g., in a codec) using a discrete cosine transform (DCT) based encoder such as MPEG or JPEG. A codec can be implemented either in software or hardware or a combination of hardware and software.
In step 10, uncompressed image data are retrieved from computer system memory or a data storage device. In step 20, pre-processing stages known in the art such as down-sampling, color-space conversion, and digitizing are performed.
In step 30, a DCT is performed to convert the image data into a two-dimensional frequency space. Typically, most images contain little high frequency information, and so most of the transformed image data are concentrated into the low frequency components (referred to as DCT coefficients). A DCT is typically applied to eight-by-eight blocks of pixels (8xc3x978 blocks), thus resulting in 64 DCT coefficients per image component that are arranged in an 8xc3x978 array. Usually, several neighboring 8xc3x978 blocks of pixel data are grouped together as a macroblock. The DCT transformation does not reduce the amount of image data.
In step 40, the quantization step, some of the frequency information is in essence discarded, so that fewer bits can be used to describe the image. Consider, for example, that there may be 256 possible levels of coloration (e.g., from lightest to darkest) for a pixel. Therefore, prior to quantization, each level would be identified by a unique combination of eight bits. However, using quantization, the 256 possible levels can be quantized into 16 steps of 16 levels each, each step identified by a unique combination of only four bits.
Using DCT, information in the lower frequency coefficients can be quantized more discretely using a relatively large number of bits, while the higher frequency coefficients can be quantized on a cruder basis using a relatively small number of bits. Thus, lower frequency coefficients might be quantized into 16 steps, each represented using four bits as described above while higher frequency coefficients are quantized into one or two steps, each represented by one bit or by a value of zero.
As mentioned above, for the JPEG and MPEG codecs, an image is typically transformed into 8xc3x978 blocks of DCT coefficients for each component. Similarly, the size of the quantization steps to be applied to the DCT coefficients are arranged in an 8xc3x978 array referred to as a quantization table, such that an entry in the quantization table corresponds to a location in the array of DCT coefficients.
The quantization table drives the amount of compression (the compression ratio) because it specifies the size of the quantization steps. The larger the quantization steps, the greater the compression ratio, but there will be a commensurate reduction in image quality. Conversely, smaller quantization steps mean that the uncompressed data are more closely represented, thereby maintaining image quality but reducing the compression ratio. Typically, a user specifies the desired level of image quality by specifying a quantization parameter, and a quantization table corresponding to that quantization parameter is selected and implemented. For example, in JPEG the quantization parameter is usually specified by selecting a number between zero and 100, with 100 corresponding to the highest level of image quality. The quantization parameter may be a factor that scales a given quantization table.
Continuing With Prior Art FIG. 1, in step 50, variable length coding (entropy coding) is performed using, for example, Huffman encoding. In this step, strings of often-repeated characters are replaced by variable-length codes, with the most common strings getting the shortest codes. In step 60, the compressed image data can be stored in memory for subsequent use. The sum of all the variable length codes is called the bitstream. The size of the bitstream (measured in bits or bytes) varies as a function of the amount of quantization as well as a function of the image data.
A desirable feature of a codec is control of the compression ratio (xe2x80x9crate controlxe2x80x9d). Rate control means that a target compression ratio is specified; when the image data are compressed according to the target compression ratio, the length of the resultant bitstream is equal to or less than the target size. The length of bitstream is usually measured in bits or bytes. With proper rate control, it is possible to efficiently allocate file space for the compressed data, since the required amount of space is roughly known. Otherwise, if too much file space is allocated, the compressed data will not fill the allocated file space and computer system memory is wasted. On the other hand, if too little file space is allocated, then the compressed data will not fit into the allocated file space, causing an error in the computation. In this situation, either the data must be further compressed or the size of the file must be increased.
Rate control is also desirable for videos comprising multiple image frames because it allows a constant file size to be specified for the compressed data for each frame. Ideally, the amount of compressed data will be relatively constant from frame to frame, and thus the target file size will be constant from frame to frame. This is desirable because, in addition to the reasons above, when the compressed data are subsequently retrieved from a file for processing, it may be necessary to specify in advance the size of the file or the range of memory addresses where the compressed data are located. Thus, a relatively constant file size makes this task easier.
However, a problem with the prior art is that it is not possible to quickly and easily implement rate control. As described above, the desired level of image quality is specified, either by a user or in an algorithm, by choosing a quantization parameter or a quantization table. The set of image data are compressed using the selected quantization table, and after the all of the data are compressed, the resultant compression ratio can be determined. However, it may turn out that the resultant compression ratio is unsatisfactory (that is, the compressed data may not properly fit the allocated file space). Consequently, the user must specify a new quantization parameter, and the set of image data must again be compressed to determine the new compression ratio. This process may need to be repeated until, through iteration or trial-and-error, the user eventually arrives at a quantization parameter that achieves the desired compression ratio so that the compressed data properly fit into the allocated file space.
For an arbitrary image, it is not possible to predict the compressed size accurately without analyzing the image. This analysis may require considerable computational resources. Thus, the prior art is problematic because the user must perform multiple compressions of an image frame in order to arrive at the target file size. These computations can take an unacceptable amount of time to complete while monopolizing the computational resources available.
Accordingly what is needed is a method for compressing image data (including video data) that allows rate control in less computation time than it takes to compress the image. The present invention provides a novel solution to this need. These and other objects and advantages of the present invention will become obvious to those of ordinary skill in the art after having read the following detailed description of the preferred embodiments which are illustrated in the various drawing figures.
The present invention pertains to a method of quickly compressing image data to achieve a desired compression ratio corresponding to a desired bitstream size (e.g., rate control). The present invention provides a method for compressing image data (including video data) that allows rate control to be efficiently practiced, and reduces the need for unnecessary iterations in order to arrive at a compression ratio that properly fits the compressed data into the target file space. The present invention also, provides a method that can be implemented using JPEG (Joint Photographic Experts Group) and MPEG (Motion Pictures Experts Group) codecs. The present invention allows rate control in less computation time than it takes to compress the image; the time to estimate the target quantization parameter can be done in less computation time than needed to compress the full image.
In the present embodiment of the present invention, a target compression ratio (the ratio of uncompressed image data to compressed image data) is specified based on a target file size for the compressed data. A subset of the image data is selected. A first quantization parameter is selected, the subset of the image data is compressed using that quantization parameter, and the resulting compression ratio is calculated. A second quantization parameter is then selected, the subset of the image data is compressed using more quantization parameters, and the resulting compression ratio is calculated. A target quantization parameter corresponding to the target compression ratio is calculated by interpolating between the first quantization parameter and the second quantization parameter and the corresponding compression ratios. The target quantization parameter is applied to compress the entire set of image data at approximately the target compression ratio. As a result, the size of the file containing the compressed data is approximately the same as the target file size.
If necessary, additional quantization parameters can be selected and applied to the subset of image data in order to accumulate additional results that can be used for the interpolation process. Also, methods other than interpolation can be used to determine the target quantization parameter.
In one embodiment, the image is encoded as a plurality of macroblocks, and a subset of the macroblocks is selected and used for the interpolation process described above. In this embodiment, every n-th macroblock (e.g., every fifth, every eighth, etc.) is selected and included in the subset. Altematively, macroblocks can be selected at random to form the subset, or a lookup table can be used to form the subset.
In accordance with the present invention, the image data can be compressed using a JPEG compression scheme or a MPEG compression scheme.
By working on a subset of the image data to determine the target quantization parameter, the amount of data that needs to be processed is significantly reduced, and so the target quantization parameter can be quickly determined. In accordance with the present invention, the time to estimate the target quantization parameter is less than the computation time needed to compress the full image.