1. Field of the Invention
This invention relates to the compression and decompression of digital data and, more particularly, to the reduction in the amount of digital data necessary to store and transmit images.
2. Background of the Invention
Image compression systems are commonly used in computers to reduce the storage space and transmittal times associated with storing, transferring and retrieving images. Due to increased use of images in computer applications, and the increase in the transfer of images, a variety of image compression techniques have attempted to solve the problems associated with the large amounts of storage space (i.e., hard disks, tapes or other devices) needed to store images.
Conventional devices store an image as a two-dimensional array of picture elements, or pixels. The number of pixels determines the resolution of an image. Typically the resolution is measured by stating the number of horizontal and vertical pixels contained in the two dimensional image array. For example, a 640 by 480 image has 640 pixels across and 480 from top to bottom to total 307,200 pixels.
While the number of pixels represents the image resolution, the number of bits assigned to each pixel represents the number of available intensity levels of each pixel. For example, if a pixel is only assigned one bit, the pixel can represent a maximum of two values. Thus the range of colors which can be assigned to that pixel is limited to two (typically black and white). In color images, the bits assigned to each pixel represent the intensity values of the three primary colors of red, green and blue. In present xe2x80x9ctrue colorxe2x80x9d applications, each pixel is normally represented by 24 bits where 8 bits are assigned to each primary color allowing the encoding of 16.8 million (28xc3x9728xc3x9728) different colors.
Consequently, color images require large amounts of storage capacity. For example, a typical color (24 bits per pixel) image with a resolution of 640 by 480 requires approximately 922,000 bytes of storage. A larger 24-bit color image with a 2000 by 2000 pixel resolution requires approximately twelve million bytes of storage. As a result, image-based applications such as interactive shopping, multimedia products, electronic games and other image-based presentations require large amounts of storage space to display high quality color images.
In order to reduce storage requirements, an image is compressed (encoded) and stored as a smaller file which requires less storage space. In order to retrieve and view the compressed image, the compressed image file is expanded (decoded) to its original size. The decoded (or xe2x80x9creconstructedxe2x80x9d) image is usually an imperfect or xe2x80x9clossyxe2x80x9d representation of the original image because some information may be lost in the compression process. Normally, the greater the amount of compression the greater the divergence between the original image and the reconstructed image. The amount of compression is often referred to as the compression ratio. The compression ratio is the amount of storage space needed to store the original (uncompressed) digitized image file divided by the amount of storage space needed to store the corresponding compressed image file.
By reducing the amount of storage space needed to store an image, compression is also used to reduce the time needed to transfer and communicate images to other locations. In order to transfer an image, the data bits that represent the image are sent via a data channel to another location. The sequence of transmitted bytes is called the data stream. Generally, the image data is encoded and the compressed image data stream is sent over a data channel and when received, the compressed image data is decoded to recreate the original image. Thus, compression speeds the transmission of image files by reducing their size.
Several processes have been developed for compressing the data required to represent an image. Generally, the processes rely on two methods: 1) spatial or time domain compression, and 2) frequency domain compression. In frequency domain compression, the binary data representing each pixel in the space or time domain are mapped into a new coordinate system in the frequency domain.
In general, the mathematical transforms, such as the discrete cosine transform (DCT), are chosen so that the signal energy of the original image is preserved, but the energy is concentrated in a relatively few transform coefficients. Once transformed, the data is compressed by quantization and encoding of the transform coefficients.
Optimization of the process of compressing an image includes increasing the compression ratio while maintaining the quality of the original image, reducing the time to encode an image, and reducing the time to decode a compressed image. In general, a process that increases the compression ratio or decreases the time to compress an image results in a loss of image quality. A process that increases the compression ratio and maintains a high quality image often results in longer encoding and decoding times. Accordingly, it would be advantageous to increase the compression ratio and reduce the time needed to encode and decode an image while maintaining a high quality image.
It is well known that image encoders can be optimized for specific image types. For example, different types of images may include graphical, photographic, or typographic information or combinations thereof. As discussed in more detail below, the encoding of an image can be viewed as a multi-step process that uses a variety of compression methods which include filters, mathematical transformations, quantization techniques, etc. In general each compression method will compress different image types with varying comparative efficiency. These compression methods can be selectively applied to optimize an encoder with respect to a certain type of image. In addition to selectively applying various compression methods, it is also possible to optimize an encoder by varying the parameters (e.g., quantization tables) of a particular compression method.
Broadly speaking, however, the prior art does not provide an adaptive encoder that automatically decomposes a source image, classifies its parts, and selects the optimal compression methods and the optimal parameters of the selected compression methods resulting in an optimized encoder that increases relative compression rates.
Once an image is optimally compressed with an encoder, the set of compressed data are stored in a file. The structure of the compressed file is referred to as the file format. The file format can be fairly simple and common, or the format can be quite complex and include a particular sequence of compressed data or various types of control instructions and codes.
The file format (the structure of the data in the file) is especially important when compressed data in the file will be read and processed sequentially and when the user desires to view or transmit only part of a compressed image file. Accordingly, it would be advantageous to provide a file format that xe2x80x9clayersxe2x80x9d the compressed image components, arranging those of greatest visual importance first, those of secondary visual importance second, and so on. Layering the compressed file format in such a way allows the first segment of the compressed image file to be decoded prior to the remainder of the file being received or read by the decoder. The decoder can display the first segment (layer) as a miniature version of the entire image or can enlarge the miniature to display a coarse or xe2x80x9csplashxe2x80x9d quality rendition of the original image. As each successive file segment or layer is received, the decoder enhances the quality of the displayed picture by selectively adding detail and correcting pixel values.
Like the encoding process, the decoding of an image can be viewed as a multi-step process that uses a variety of decoding methods which include inverse mathematical transformations, inverse quantization techniques, etc. Conventional decoders are designed to have an inverse function relative to the encoding system. These inverse decoding methods must match the encoding process used to encode the image. In addition, where an encoder makes content-sensitive adaptations to the compression algorithm, the decoder must apply a matching content-sensitive decoding process.
Generally, a decoder is designed to match a specific encoding process. Prior art compression systems exist that allow the decoder to adjust particular parameters, but the prior art encoders must also transmit accompanying tables and other information. In addition, many conventional decoders are limited to specific decoding methods that do not accommodate content-sensitive adaptations.
The problems outlined above are solved by the method and apparatus of the present invention. That is, the computer-based image compression system of the present invention includes a unique encoder which compresses images and a unique decoder which decompresses images. The unique compression system obtains high compression ratios at all image quality levels while achieving relatively quick encoding and decoding times.
A high compression ratio enables faster image transmission and reduces the amount of storage space required to store an image. When compared with conventional compression techniques, such as the Joint Photographic Experts Group (JPEG), the present invention significantly increases the compression ratio for color images which, when decompressed, are of comparable quality to the JPEG images. The exact improvement over JPEG will depend on image content, resolution, and other factors.
Smaller image files translate into direct storage and transmission time savings. In addition, the present invention reduces the number of operations to encode and decode an image when compared to JPEG and other compression methods of a similar nature. Reducing the number of operations reduces the amount of time and computing resources needed to encode and decode an image, and thus improves computer system response times.
Furthermore, the image compression system of the present invention optimizes the encoding process to accommodate different image types. As explained below, the present invention uses fuzzy logic techniques to automatically analyze and decompose a source image, classify its components, select the optimal compression method for each component, and determine the optimal content-sensitive parameters of the selected compression methods. The encoder does not need prior information regarding the type of image or information regarding which compression methods to apply. Thus, a user does not need to provide compression system customization or need to set the parameters of the compression methods.
The present invention is designed with the goal of providing an image compression system that reliably compresses any type of image with the highest achievable efficiency, while maintaining a consistent range of viewing qualities. Automating the system""s adaptivity to varied image types allows for a minimum of human intervention in the encoding process and results in a system where the compression and decompression process are virtually transparent to the users.
The encoder and decoder of the present invention contain a library of encoding methods that are treated as a xe2x80x9ctoolbox.xe2x80x9d The toolbox allows the encoder to selectively apply particular encoding methods or tools that optimize the compression ratio for a particular image component. The toolbox approach allows the encoder to support many different encoding methods in one program, and accommodates the invention of new encoding methods without invalidating existing decoders. The toolbox approach thus allows upgradeability for future improvements in compression methods and adaptation to new technologies.
A further feature of the present invention is that the encoder creates a file format that segments or xe2x80x9clayersxe2x80x9d the compressed image. The layering of the compressed image allows the decoder to display image file segments, beginning with the data at the front of the file, in a coherent sequence which begins with the decoding and display of the information that constitutes the core of the image as defined by human perception. This core information can appear as a good quality miniature of the image and/or as a full sized xe2x80x9csplashxe2x80x9d or coarse quality version of the image. Both the miniature and splash image enable the user to view the essence of an image from a relatively small amount of encoded data. In applications where the image file is being transmitted over a data channel, such as a telephone line or limited bandwidth wireless channel, display of the miniature and/or splash image occurs as soon as the first segment or layer of the file is received. This allows users to view the image quickly and to see detail being added to the image as subsequent layers are received, decoded, and added to the core image.
The decoder decompresses the miniature and the full sized splash quality image from the same information. User specified preferences and the application determine whether the miniature and/or the full sized splash quality image are displayed for any given image.
Whether the first layer is displayed as a miniature or a splash quality full size image, the receipt of each successive layer allows the decoder to add additional image detail and sharpness. Information from the previous layer is supplemented, not discarded, so that the image is built layer by layer. Thus a single compressed file with a layered file format can store both a thumbnail and a full size version of the image and can store the full size version at various quality levels without storing any redundant information.
The layered approach of the present invention allows the transmission or decoding of only the part of the compressed file which is necessary to display a desired image quality. Thus, a single compressed file can generate a thumbnail and different quality full size images without the need to recompress the file to a smaller size and lesser quality, or store multiple files compressed to different file sizes and quality levels.
This feature is particularly advantageous for on line service applications, such as shopping or other applications where the user or the application developer may want several thumbnail images downloaded and presented before the user chooses to receive the entire full size, high quality image. In addition to conserving the time and transmission costs associated with viewing a variety of high quality images that may not be of interest, the user need only subsequently download the remainder of each image file to view the higher detail versions of the image.
The layered format also allows the storage of different layers of the compressed data file separate from one another. Thus, the core image data (miniature) can be stored locally (e.g., in fast RAM memory for fast access), and the higher quality xe2x80x9cenhancementxe2x80x9d layers can be stored remotely in lower cost bulk storage.
A further feature of the layered file format of the present invention allows the addition of other compressed data information. The layered and segmented file format is extendable so that new layers of compressed information such as sound, text and video can be added to the compressed image data file. The extendable file format allows the compression system to adapt to new image types and to combine compressed image data with sound, text and video.
Like the encoder, the decoder of the present invention includes a toolbox of decoding methods. The decoding process can begin with the decoder first determining the encoding methods used to encode each data segment. The decoder determines the encoding methods from instructions the encoder inserts into the compressed data file.
Adding decoder instructions to the compressed image data provides several advantages. A decoder that recognizes the instructions can decode files from a variety of different encoders, accommodate content-sensitive encoding methods, and adjust to user specific needs. The decoder of the present invention also skips parts of the data stream that contain data that are unnecessary for a given rendition of the image, or ignore parts of the data stream that are in an unknown format. The ability to ignore unknown formats allows future file layers to be added while maintaining compatibility with older decoders.
In a preferred embodiment of the present invention, the encoder compresses an image using a first Reed Spline Filter, an image classifier, a discrete cosine transform, a second and third Reed Spline Filter, a differential pulse code modulator, an enhancement analyzer, and an adaptive vector quantizer to generate a plurality of data segments that contain the compressed image. The plurality of data segments are further compressed with a channel encoder.
The Reed Spline Filter includes a color space conversion transform, a decimation step and a least mean squared error (LMSE) spline fitting step. The output of the first Reed Spline Filter is then analyzed to determine an image type for optimal compression. The first Reed Spline Filter outputs three components which are analyzed by the image classifier. The image classifier uses fuzzy logic techniques to classify the image type. Once the image type is determined, the first component is separated from the second and third components and further compressed with an optimized discrete cosine transform and an adaptive vector quantizer. The second and third components are further compressed with a second and third Reed Spline Filter, the adaptive vector quantizer, and a differential pulse code modulator.
The enhancement analyzer enhances areas of an image determined to be the most visually important, such as text or edges. The enhancement analyzer determines the visual priority of pixel blocks. The pixel block dimensions typically correspond to 16xc3x9716 pixel blocks in the source image. In addition, the enhancement analyzer prioritizes each pixel block so that the most important enhancement information is placed in the earliest enhancement layers so that it can be decoded first. The output of the enhancement analyzer is compressed with the adaptive vector quantizer.
A user may set the encoder to compute a color palette optimized to the color image. The color palette is combined with the output of the discrete cosine transform, the adaptive vector quantizer, the differential pulse code modulator, and the enhancement analyzer to create a plurality of data segments. The channel encoder then interleaves and compresses the plurality of data segments.