1. Field of the Invention
Modern computers and modern computer networks enable the transfer of a significant amount of information between computers and between a computer and a storage device. When computers access local storage devices such as a local hard drive or local floppy drive, significant amounts of information can be quickly accessed. However, when seeking to access data from a remote storage location such as over a wide area network (WAN) or the internet, data transfer rates are significantly slower. Transferring large files, therefore, takes significant amounts of time. Additionally, storage of large files utilizes valuable and limited storage space. Photographic images and similar graphical images typically are considered to be large files, since an image conventionally requires information on each picture element or pixel in the image. Photographs and similar graphical images, therefore, typically require over one megabyte of storage space, and therefore require significant transmission times over slow network communications. In recent years, therefore, numerous protocols and standards have been developed for compressing photographic images to reduce the amount of storage space required to store photographic images, and to reduce transfer and rendering times. The compression methods essentially create mathematical or statistical approximations of the original image.
Compression methods can broadly be categorized into two separate categories: Lossy compression methods are methods wherein there is a certain amount of loss of fidelity of the image; in other words, close inspection of the reproduced image would show a loss of fidelity of the image. Lossless compression methods are ones where the original image is reproduced exactly after decoding. The present invention is directed to an efficient image compression method and apparatus wherein part of an image can be compressed with a higher level of fidelity in the reproduced image than other parts of the image, based on a selection of a region-of-interest by the user who is initially encoding or compressing the image, or the user who receives and decodes the image data through interaction with the encoding side.
2. Description of the Related Art
A currently popular standard for compressing images is called the JPEG or “J-peg” standard. This standard was developed by a committee called The Joint Photographic Experts Group, and is popularly used to compress still images for storage or network transmission. Recent papers by Said and Pearlman discuss new image coding and decoding methods based upon set partitioning in hierarchical trees (SPIHT). See Said and Pearlman, Image Codec Based on Set Partitioning in Hierarchical Trees, IEEE Transactions on Circuits and Systems for Video Technology, vol. 6, no. 3, June 1996, and Said and Pearlman, Image Multi-Resolution Representation, IEEE Transactions on Image Processing, vol. 5, no. 9, September 1996. The contents of these papers are hereby incorporated by reference. These references disclose computer software which, when loaded and running on a general purpose computer, performs a method and creates an apparatus which utilizes integer wavelet transforms which provide lossy compression by bit accuracy and lossless compression within a same embedded bit stream, or apparatus which utilizes non-integer wavelet transforms which provide lossy compression by bit accuracy within a single embedded bit stream. An image which is initially stored as a two dimensional array representing a plurality of individual pixels prioritizes bits according to a transform coefficient for progressive image transmission. The most important information is selected by determining significant or insignificant elements with respect to a given threshold utilizing subset partitioning. The progressive transmission scheme disclosed by Said and Pearlman selects the most important information to be transmitted first based upon the magnitude of each transform coefficient; if the transform is unitary, the larger the magnitude, the more information the coefficient conveys in the mean squared error (MSE, Dmse( )) sense;
            D      mse        ⁡          (              p        -                  p          ^                    )        =                                                    p            -                          p              ^                                                2            N        =                  1        N            ⁢                        ∑          i                                                ⁢                                  ⁢                              ∑            j                                                          ⁢                                          ⁢                                    (                                                p                                      i                    ,                    j                                                  -                                                      p                    ^                                                        i                    ,                    j                                                              )                        2                              where (i,j) is the pixel coordinate, with p, therefore representing a pixel value. Two dimensional array c is coded according to c=Ω (p), with Ω(•) being used to represent a unitary hierarchical subband transformation. Said and Pearlman make the assumption that each pixel coordinate and value is represented according to a fixed-point binary format with a relatively small number of bits which enables the element to be treated as an integer for the purposes of coding. The reconstructed image {circumflex over (p)} is performed by setting a reconstruction vector ĉ to 0, and calculating the image as:{circumflex over (p)}=Ω−1(ĉ)
N is the number of image pixels, and the above calculation for mean squared-error distortion can therefore be made. Using mathematical assumptions, it is known that the mean squared-error distortion measure decreases by ∥ci, j∥2/N. This fact enables pixel values to be ranked according to their binary representation, with the most significant bits (MSBs) being transmitted first, and also enables pixel coefficients with larger magnitude to be transmitted first because of a larger content of information. An algorithm is utilized by the encoder to send a value representing the maximum pixel value for a particular pixel coordinate, sorting pixel coordinates by wavelet transform coefficient values, then outputting a most significant bit of the various coefficients, using a number of sorting passes and refinement passes, to provide high quality reconstructed images utilizing a small fraction of the transmitted pixel coordinates. A user can set a desired rate or distortion by setting the number of bits to be spent in sorting passes and refinement passes. Utilizing a spatial orientation tree, as shown in FIG. 1, pixel information is separated into a List of Insignificant Sets (LIS), a list of insignificant pixels (LIP), and a List of Significant Pixels (LSP). FIG. 1 illustrates image 100, with a plurality of pixel sets 101, 102, . . . , 10x therein. The spatial orientation tree is developed as known in the art, by decomposition of integer-valued or non-integer-valued wavelet transform (WT) coefficients. Coefficients in the LH subband of each decomposition level forms the spatial orientation tree. In this example, parent node 101 has a series of roots and offspring nodes 102-107. The LIP is a list of coordinates of insignificant pixel or WT coefficients, the LIS is a list of coordinates of tree roots with insignificant descendent sets, with multiple types of entries on the list (Type A and Type B), and the LSP is a list of coordinates of significant pixels. Sorting and partitioning of the list contents is performed as illustrated in FIG. 2. The significance determination which is made in the flow chart of FIG. 2 is based upon a given significance threshold entries from the LIP which are determined to be significant at 202 LSP, 203, and entries which are determined not to be significant at 202 are returned to the LIP for testing during subsequent passes. If it is determined that all LIP entries have been tested at 204, then LIS entries begin to be tested. If all LIP entries are not tested, a next LIP entry is tested for significance at 202. Assuming all LIP entries are tested, LIS entries at 205 are tested at 206 to determine whether the LIS entries are type A, which are sets of coordinates of descendants of a node, or type B if the entry represents a difference between coordinates of descendants and offspring. If the sets are determined to be type A, significance is tested at 207. If significant, the set is partitioned at 208 into offspring and descendants of offspring with offspring being tested for significance at 209. If significant, the coordinate is placed on the LSP. If insignificant, the tested offspring is moved to the end of the LIP. If the initial type A entry is determined to be insignificant at 207, the entry is returned to the LIS. Type B LIS entries are tested for significance at 210, and moved to the LIP if significant or returned to the LIS if insignificant. After each test for significance, a one is output if the entry is determined to be significant, and a zero is output if the entry is determined to be insignificant. The ones and zeros are used to indicate when a specified number of bits have been output for termination purposes. Decoding occurs in a same, but reversed fashion. Entries of each list are identified by the pixel coordinates, with the LIP and LSP representing individual pixels, and the LIS representing sets of coordinates, with the sets of coordinates being grouped according to their status as either coordinates of a descendent of a node of the spatial orientation tree.
Using the encoding algorithm mentioned above, sorting passes are performed until reaching the selected termination point, with an increase in sorting passes providing a decrease in distortion due to further refinement provided by more accurate significance classification. Increased sorting passes, however, requires additional time. The decoder duplicates the encoder's execution path in reverse to sort the significant coefficients, with “outputs” being changed to “inputs” for decoding, to recover appropriate ordering information. The coding method of the prior art, therefore, attempts to mathematically determine an area of the image which should have a higher fidelity or lower loss than areas of the image based upon the significance determinations. FIG. 3 illustrates an important aspect of the SPIHT coding, which is repetitive sorting passes and refinement passes for a given threshold; sorting and refinement is repeated until encoding is complete. (Refer to the above-referenced articles for a more complete discussion of SPIHT coding).