1. Field of the Invention
Modern computers and modern computer networks enable the transfer of a significant amount of information between computers and between a computer and a storage device. When computers access local storage devices, such as a local hard drive or local floppy drive, significant amounts of information can be quickly accessed. However, when seeking to access data from a remote storage location such as over a wide area network (WAN), the internet, or a wireless communication channel (cellular phone network, etc), data transfer rates are significantly slower. Transferring large files, therefore, takes significant amounts of time. Additionally, storage of large files utilizes valuable and limited storage space. Photographic images and similar graphical images typically are considered to be large files, since an image conventionally requires information on each picture element or pixel in the image. Photographs and similar graphical images, therefore, typically require over one megabyte of storage space, and therefore require significant transmission times over slow network communications. In recent years, therefore, numerous protocols and standards have been developed for compressing photographic images to reduce the amount of storage space required to store photographic images, and to reduce transfer and rendering times. The compression methods essentially create mathematical or statistical approximations of the original image.
Compression methods can broadly be categorized into two separate categories: Lossy compression methods are methods wherein there is a certain amount of loss of fidelity of the image; in other words, close inspection of the reproduced image would show a loss of fidelity of the image. Lossless compression methods are ones where the original image is reproduced exactly after decoding. The present invention is directed to an efficient image compression method and apparatus wherein a part, or parts, of an image can be compressed with a higher level of fidelity in the reproduced image than other parts of the image, based on a selection of region-of-interests by the user or the system which is initially encoding or compressing the image, or the user or the system which receives and decodes the image data through interaction with the encoding side.
2. Description of the Related Art
A currently popular standard for compressing images is called the JPEG or xe2x80x9cJ-pegxe2x80x9d standard. This standard was developed by a committee called The Joint Photographic Experts Group, and is popularly used to compress still images for storage or network transmission. Recent papers by Said and Pearlman discuss new image coding and decoding methods based upon set partitioning in hierarchical trees (SPIHT). See Said and Pearlman, Image Codec Based on Set Partitioning in Hierarchical Trees, IEEE Transactions on Circuits and Systems for Video Technology, vol. 6, no. 3, June 1996, and Said and Pearlman, Image Multi-Resolution Representation, IEEE Transactions on Image Processing, vol. 5, no. 9, September 1996. The contents of these papers are hereby incorporated by reference. These references disclose computer software which, when loaded and running on a general purpose computer, performs a method and creates an apparatus which utilizes integer wavelet transforms which provide lossy compression by bit accuracy and lossless compression within a same embedded bit stream, or an apparatus which utilizes non-integer wavelet transforms which provide lossy compression by bit accuracy within a single embedded bit stream. An image which is initially stored as a two dimensional array representing a plurality of individual pixels prioritizes bits according to a transform coefficient for progressive image transmission.
The most important information is selected by determining significant or insignificant elements with respect to a given threshold utilizing subset partitioning. The progressive transmission scheme disclosed by Said and Pearlman selects the most important information to be transmitted first based upon the magnitude of each transform coefficient; if the transform is unitary, the larger the magnitude, the more information the coefficient conveys in the mean squared error (MSE, Dmse( )) sense;             D      mse        ⁡          (              p        -        p            )        =                              "LeftDoubleBracketingBar"                      p            -            p                    "RightDoubleBracketingBar"                2            N        =                  1        N            ⁢                        ∑          i                ⁢                              ∑            j                    ⁢                                    (                                                p                                      i                    ,                    j                                                  -                                  p                                      i                    ,                    j                                                              )                        2                              
where (i,j) is the pixel coordinate, with p, therefore representing a pixel value. Two dimensional array c is coded according to c=xcexa9 (p), with xcexa9(.) being used to represent a unitary hierarchical subband transformation. Said and Pearlman make the assumption that each pixel coordinate and value is represented according to a fixed-point binary format with a relatively small number of bits which enables the element to be treated as an integer for the purposes of coding. The reconstructed image p is performed by setting a reconstruction vector ĉ to 0, and calculating the image as:
p=xcexa9xe2x88x921(ĉ)
N is the number of image pixels, and the above calculation for mean squared-error distortion can therefore be made. Using mathematical assumptions, it is known that the mean squared-error distortion measure decreases by ∥ci,j ∥2/N. This fact enables pixel values to be ranked according to their binary representation, with the most significant bits (MSBs) being transmitted first, and also enables pixel coefficients with larger magnitude to be transmitted first because of a larger content of information. An algorithm is utilized by the encoder to send a value representing the maximum pixel value for a particular pixel coordinate, a sorting pixel coordinates by wavelet transform coefficient values, then outputting a most significant bit of the various coefficients, using a number of sorting passes and refinement passes, to provide high quality reconstructed images utilizing a small fraction of the transmitted pixel coordinates. A user can set a desired rate or distortion by setting the number of bits to be spent in sorting passes and refinement passes.
The invention is a method and apparatus for encoding images for transmission or storage where a region of interest (ROI) or certain regions of the image are to be emphasized and for decoding the encoded image after transmission or retrieval from storage. The encoding method includes selecting a region or regions of interest in digital image data, and specifying a priority to each region. A wavelet transform of the pixel values of the entire image is performed in order to obtain the transform coefficients of the wavelet, and the transform coefficients corresponding to each region of interest are identified. The transform coefficients for each region of interest are emphasized by scaling up these transform coefficients in such a way that more bits are allocated to these transform coefficients or encoding ordering of these coefficients are advanced. After the scaling up the transform coefficients for each region of interest, quantization is performed on the transform coefficients for the entire image in order to obtain the quantization indices. In the alternative, the quantization indices of the quantized transform coefficients corresponding to each region of interest are scaled up according to the priority assigned to each region of interest. After the quantization for the entire image, scaling up is performed for each region of interest. The quantization indices of the transform coefficients are entropy encoded based upon the encoding strategy such as encoding ordering or bit allocation determined by the scaling up for each region of interest in order to form a data bit stream. A bit stream header is formed, and the data bit stream is appended to the bit stream header. The entropy coding is performed on each bit field of the binary representation of the quantization indices of the transform coefficients. Either bit plane coding is used, such as a binary arithmetic coding technique, or a zero-tree coding technique, such as SPIHT coding, is used. The decoding method includes separating the bit stream header from the data bit stream, decoding the description such as coordinates of the region or regions of interest, priority to each region, size of the image, and the number of wavelet decomposition levels from the bit stream header. The wavelet transform coefficients corresponding to a region or regions of interest specified by the description of the region or the regions of interest are identified, and the data stream is entropy decoded by following the decoding ordering determined by the identified result of the transform coefficients corresponding to each region of interest and the priority assigned to each region of interest. This forms a set of subbands containing the quantization indices of the transform coefficients. Either the de-quantized transform coefficients or the quantization indices of the transform coefficients corresponding to each region of interest are scaled down. If scaling up and quantization are performed in this order at the encoder, de-quantization of the transform coefficients for the entire image and scaling down the quantized transform coefficients for each region of interest is performed in this order; if quantization and scaling up are performed in this order at the encoder, scaling down the quantization indices for each region of interest and de-quantization of the quantization indices for the entire image is performed in this order. In either case, de-quantization is performed on the quantization indices in order to obtain the quantized transform coefficients. The inverse wavelet transform is performed on the de-quantized transform coefficients in order to form the pixel values on the entire image. The digital image in this invention can be not only two dimensional digital data but also one dimensional digital data such as voice data, electrocardiogram data, seismic wave data. When the data is one dimensional, steps and means based on wavelet transform, subband, ROI coefficient identification or inverse wavelet transform which are applied along each dimension of the two dimensional data are applied only along the single dimension of the data.