The present invention relates to processing digital data.
Digital data can represent digital images, sound tracks or any other digitized content. In digital images, digital data represent graphics objects, such as lines, shapes, photographs, paintings or images of text. The graphics objects can be defined by vector graphics or by a raster array of closely spaced pixels that are basic picture elements. Each pixel specifies local graphical attributes, such as color or opacity of a corresponding local portion of the image. The number of pixels in an image is referred to as a resolution of the image. Pixel arrays are used in scanning, displaying and image capturing devices, and in computer applications, such as presentation, animation, painting or design applications.
A typical image includes correlations or repetitions within the pixel array. Pixel values may be correlated within a shape or along lines, or patterns or symbols may be repeated at multiple locations within the same image. For example, the same character is typically repeated at multiple locations in a scanned image of text. Compression techniques take advantage of the correlations and repetitions to generate compressed representations that are more compact than a bitmap in which each pixel is represented by a predetermined number of bits. The compressed representations can be lossy or lossless depending on whether information is lost or not from the original image due to the compression. Examples of compressed representations include Joint Photographic Experts Group (“JPEG”) format, Graphics Interchange Format (“GIF”), Tagged Image File Format (“TIFF”) and Joint Bi-level Image Experts Group (“JBIG”) format.
A compressed representation typically includes one or more content channels for the digital data, where each content channel corresponds to a particular type of content. For example, a respective content channel can be defined for each color component in a standard color space, such as a red-green-blue (RGB) color space or a YUV color space. In the YUV color space, luminance is represented by Y, and chrominance is represented by U and V. Accordingly, the content channels can include separate Y, U and V channels. In each channel, the content is represented by data values for multiple representation components that are separately addressable components in the compressed representation.
In a JPEG representation, a respective pixel array is defined for each channel. In the array, the pixels are grouped into sub-arrays of eight-by-eight pixels. A discrete cosine transformation (“DCT”) is applied to each sub-array to generate amplitude values for sixty-four different discrete frequencies (combinations of eight horizontal and eight vertical frequencies) that can be separately addressed. Thus each DCT frequency is a representation component in the JPEG representation.
In lossy representations, the information loss is typically controlled by compression parameters, such as downsampling parameters or quantizors. Each downsampling parameter defines a resolution reduction for a corresponding pixel array, and each quantizor defines a corresponding quantization of data values. For example, a horizontal or vertical downsampling parameter of two reduces by half the number of pixels in each row or each column, respectively. Or a quantizor of two defines a representation in which data can have only discrete values in units of two.
Typically, separate compression parameters are used for different content channels and different representation components. In JPEG, the chrominance channels are typically downsampled both horizontally and vertically, while the resolution is not reduced in the luminance channel. Furthermore, each DCT frequency component has a corresponding quantizor. Typically, the higher the frequency component, the larger the corresponding quantizor.
In general, digital data values can be represented according to fixed length or variable length encodings. Fixed length encodings use a fixed number of bits to represent different data values. For example in a bitmap, each pixel value is represented by a predetermined number of bits. Compressed representations, however, often use variable length encoding. In variable length encoding, different code values are represented by different number of bits. The number of bits that are used to represent a code value is referred to as the length of the code value. The code values represent data values according to an encoding. The encoding can specify a predetermined code value for each represented data values. Alternatively, the code value can depend upon the frequency of occurrence of the represented data value. To decrease storage sizes, JPEG representations typically use Huffman encoding in which the most frequent data values are represented by code values having the smallest lengths.
Before compressing a digital image into a JPEG representation or other representation using complex variable length encoding, traditional software applications do not try to predict a storage size for the compressed representation of the image. Instead, the compressed representation is actually generated, and the storage size of the generated representation is measured. If requested by a user, the measured storage size is traditionally presented in numerical form in a user interface.
For compressing an image, some computer applications allow the user to select one or more quality parameters that specify compression parameters. The compressed representation is generated with the specified compression parameters, and the storage size of the generated representation is presented to the user. If the storage size is too large or too small, the user can set new quality parameters. To verify whether the new parameters define an acceptable storage size, the user has to wait until a corresponding new compressed representation is generated. To find the quality parameters that provide the desired storage size, the user may have to guess several times, and wait each time for generating the corresponding representation. For compressing a large amount of data, such as a large number of digital images, the user may have to wait several seconds or even a few minutes between subsequent guesses.