The present invention is related to the invention disclosed in U.S. patent application Ser. No. 09/429,467 of E. Ammicht et al., filed Oct. 28, 1999 and entitled xe2x80x9cImproved Coefficient Computation in Image Compression Using Set Partitioning in Hierarchical Trees,xe2x80x9d which is incorporated by reference herein.
The invention relates generally to encoding of image signals, and more particularly to memory-efficient encoding techniques for use with such signals.
Transformation of an image comprising a two-dimensional array of picture elements (pixels) by use of wavelet basis functions has been found to provide a very effective means for achieving a significant reduction in the amount of information required to represent the image, with minimal loss in perceivable resolution. Several wavelet-based image coding standards have been proposed recently for use in image compression applications.
The wavelet transform, which is the first stage in an image compression process, typically involves applying a multi-stage iterative filter-and-decimate process to the image to yield an array of coefficients equivalent in dimension to the original image. A two-dimensional wavelet transform can be implemented as the convolution of two one-dimensional filters. In such an approach, the image rows are filtered individually to yield an array equal in size to the image, and then the columns of this resultant array of coefficients are filtered to again yield the same size array. The same processing is subsequently applied to a sub-array of the resultant array as the second stage of the image compression process, etc. Typically four to six or more such filter-and-decimate stages of processing are applied.
FIG. 1 shows the functional data flow in the first stage of a conventional wavelet transform. Pixel input by row at a sample rate of fs is filtered in a horizontal filter comprising a low pass filter 10 and a high pass filter 12 to generate low pass and high pass portions, respectively, of a coefficient array 15. The low pass filter 10 and high pass filter 12 each operate at sample rates of fs/2. The coefficient array 15 is then filtered in a vertical filter including high pass and low pass filter pairs 16, 17 to generate a coefficient array 18. Each of the coefficient arrays 15 and 18 is the same size as the original image.
Because the image size measured in pixel count is both large and variable, e.g., 2 million pixels for a high definition television (HDTV) image up to more than 10 million pixels for high-resolution satellite images, it is generally not practical to implement an integrated circuit wavelet processor capable of both performing the above-described transform and storing the entire image and/or intermediate results. It is also prohibitively expensive to provide the capability to stream the original image at real-time rates through the chip for horizontal filtering with results written to memory, then re-access the resultant array column-by-column, i.e., in transposed form, to perform the vertical filtering, and to repeat this process for multiple stages. For example, in the case of HDTV, pixels arrive at the rate of 60 million/second and include both light intensity (luma) and color (chroma) values. Requiring the memory read/write cycle time to be several times as fast as the pixel period, i.e., a cycle time of one quarter or less of the 16 nanosecond pixel period, would generally require expensive memory devices and complex supporting technologies such as, e.g., address generation.
A need therefore exists for improved wavelet transform techniques which are suitable for use in image compression and other applications, and which avoid the excessive memory and processing costs associated with conventional wavelet transforms.
The invention provides improved wavelet transform techniques which implement a xe2x80x9ctranspose on the flyxe2x80x9d concept that results in substantially reduced memory and computational requirements. In accordance with the invention, a multi-stage wavelet transform of an image signal is implementing using a first processing element to perform computations for a first stage of the transform, and a second processing element operating in a time-multiplexed manner to perform computations for subsequent stages of the transform. The first processing element includes first and second adder trees for implementing horizontal and vertical filtering operations, respectively, and a set of row buffers configured such that the total number of row buffers is only one more than the number of pixels required to generate a given vertically-filtered output. In a four-stage illustrative embodiment in which the first stage processing element processes image pixel data at a sample rate of fs, the multi-stage processing element receives inputs from the first stage processing element at a sample rate of fs/4, and generates coefficients for the second, third and fourth stages using sample rates of fs/16, fs/64 and fs/256, respectively, in accordance with a time-multiplexed processing schedule. The particular number of stages may be considered a design parameter, and other embodiments may have more or less than four stages. For example, a six-stage embodiment may be configured in which the multi-stage processing element generates coefficients for the second through sixth stages.
In accordance with another aspect of the invention, the time-multiplexed schedule for processing operations may be generated as a binary tree representation in the form of a synchronous data flow (SDF) graph. The operations performed by the first stage processing element correspond to the lowest-level nodes of the binary tree representation of the processing schedule, and the operations performed by the multi-stage processing element correspond to the remaining nodes of the binary tree representation.
Advantageously, the invention provides a wavelet transform structure that is capable of performing recursive, multi-stage processing on variable-sized images, and without the increased memory speed requirements associated with streaming of intermediate results in and out of image memory. In accordance with the invention, the latency or in-process delay is minimized by starting successive stages of processing as soon as the requisite inputs become available. Moreover, the invention implements a processing structure that can be time-multiplexed among several stages of wavelet transform processing simultaneously, by exploiting similarity in successive wavelet transform processing stages.