The field of digital data compression and in particular digital image compression has attracted a great interest for some time. Most recently, compression schemes based on a Discrete Wavelet Transform (DWT) have become increasingly popular because the DWT offers a non-redundant hierarchical decomposition of an image and resultant compression of the image provides favourable rate-distortion statistics.
Typically, the discrete wavelet transform (DWT) of an image is performed using a series of one-dimensional DWTs. A one-dimensional DWT of a signal (ie an image row) is performed by lowpass and highpass filtering the signal, and decimating each filtered signal by 2. Decimation by 2 means that only every second sample of the filtering processes is calculated and retained. When performing a convolution (filtering) the filter is moved along by two samples at a time, instead of the usual one sample, to effect the decimation by 2. In this way for a signal of N samples there are N DWT samples: N/2 lowpass samples and N/2 highpass samples. Strictly speaking this is a single level one dimensional DWT. However, since only single level one-dimensional DWTs are used in this description, they are referred to simply as a one-dimensional DWT (1D DWT).
FIG. 8 illustrates a typical process for performing a single level two-dimensional DWT of an input image. Each column of the image is analysed with a one-dimensional DWT giving output columns whose first half consists of lowpass samples and second half consists of highpass samples. A single level one-dimensional DWT of a column is referred to as a column DWT. The analysis of each column results in two sub-images labelled Lc and Hc. The columns of Lc consists of the lowpass filtered (and decimated) columns of the input image and the columns of Hc consists of the highpass filtered (and decimated) columns of the input image. The rows of the resulting output image are then analysed with a one-dimensional DWT, or row DWT, as illustrated in FIG. 8. This results in four sub-images or subbands, labelled LL, HL, LH and HH, where L and H refer to lowpass and highpass respectively, and in the two letter label the first letter corresponds to the row filter, while the second to the column filter. That is, the HL subband is the result of lowpass filtering the columns (and decimating by 2) and highpass filtering the resulting rows (and decimating by 2). The LL subband is also called the DC or low frequency subband while the HL, LH and HH are called AC or high frequency subbands. Depending on the context, a single level two-dimensional DWT of an image is referred to as a single level DWT (or even simply DWT) of an image.
Each one-dimensional DWT can be inverted. That is having analysed a one-dimensional signal of N samples into N/2 lowpass and N/2 highpass subband samples these subband samples, of which there are N in total, can be synthesized with a one-dimensional inverse DWT, into the N samples of the original one-dimensional signal. Thus the original image can be reconstructed by synthesising the rows then the columns of a single level DWT of an image. This is also illustrated in FIG. 8. Thus a single level (two-dimensional) DWT is invertible. The process of a inverting a DWT is often referred to as synthesis or applying and an inverse DWT, or simply iDWT.
For a two level DWT the LL subband is further analysed with a single level DWT into four subbands, just as the original image was analysed into four subbands. For a three level DWT the resulting LL subband is again analysed, and so on for an arbitrary number of levels of the DWT. Thus a multi-level DWT or simply DWT of an image can be performed by iterating a single level DWT some finite number of times on subsequent LL subbands, where the first LL subband is the original image. A multi-level DWT can be inverted by simply inverting each single level DWT.
At each level of a multi-level DWT there are three high frequency subbands, the HL, LH and HH subbands. Therefore, for a more precise notation a level number is included in the labelling of the subbands. Thus the four subbands illustrated in FIG. 8 are more precisely denoted LL1, HL1, LH1 and HH1. Similarly the three high frequency subbands at level 2, resulting from the single level DWT of the LL1 subband, are denoted HL2, LH2 and HH2. Using this subband notation the original image can be labelled as the LL0 subband.
Image compression is typically executed on general purpose computers and also on application specific devices (ASIC). Both of these systems employ an architecture where the main processing unit has access to different memory units. These memories are differentiated by size and speed of access (and cost). For the purposes of our discussion it suffices to consider two memories: namely a fast memory, that is typically on the same chip as the processor and is relatively small, and a slower memory, that is typically on another chip to that of the processor and is relatively large. The fast smaller memory is referred to as local or internal memory, and the slower larger memory as external memory.
The frequency of and hence the amount of time it takes to read and or write data from and to memory (typically into or out of registers) is referred to as bandwidth. Internal bandwidth refers to read and write accessed from and to internal memory, while that from and to external memory is referred to as external bandwidth.
In some image compression methods the subbands resulting from a DWT are tiled into blocks of samples. For example, each block consists of say H rows by H columns. A row of blocks in a subband consists of H lines of the subband. Each block is quantised and entropy coded substantially independently. Thus each block can be entropy decoded (and dequantized) independently. A block essentially becomes a minimum coded unit. The blocks are not necessarily strictly independently encoded. Some small amount of information, such as the most significant bit plane in each block, may be coded together for all blocks in a subband. However, if the time or effort required to encode or decode such information for one block is trivial when compared to encoding or decoding a whole block, then for our purposes it is considered that the blocks are coded independently.
A single level DWT of an image is typically performed by buffering the entire image in memory and performing the DWT on the buffered image. Unfortunately, this approach requires a large amount of memory, particularly for image of 2000 pixels×2000 pixels or more. Thus a relatively large amount of (slower) external memory is required. Further several write and read accesses to and from this slower external memory are required to perform the DWT. These accesses then restrict the speed at which the whole DWT can be performed.
A line-based implementation of a DWT, can be employed to reduce the amount of memory required to perform the DWT and iDWT of an image, A line-based DWT typically performs the 2D DWT operation in vertical segments. That is for example, some number of input image rows are first row transformed, right across the image, and then the resulting transformed rows (partially) vertically transformed, to produce some number of lines of each of the LL, HL, LH and HH subbands. Then a next segment of input image rows are transformed horizontally, and the resulting lines, plus perhaps some of the previous horizontally transformed lines, are vertically transformed to produce a next set of output subband lines, and so on. A lifting implementation of a line-based DWT has been proposed that requires a fewer number of lines to be buffered, and less computation than a convolution line-based DWT. These line-based DWTs are two-pass in that in one dimension the full transform is carried out (that is for the entire signal length in that dimension).
A line-based DWT typically require significantly less memory (line buffering) than the full-image DWT, but the bandwidth between processor and external memory (or typically a secondary level cache) still remains high. Further, for compression purposes, typically the high frequency subband code blocks need to be buffered in external memory before quantization and entropy encoding, thus adding to memory and bandwidth costs.