1. Field of the Inventions
The present invention relates to a method of multi-level iterative filtering of data structures of two dimensions or more and to a filtering system for carrying out the method which may be included in an encoder and/or a decoder. The present invention is particularly suitable for the filtering of image data.
2. Description of the Related Technology
In image processing systems, memory represents a high cost in size, power and speed, especially in multi-pass processing (e.g. iterative processing on multi-resolution data). In low-cost VLSI implementation styles, only limited-sized memories can be put on-chip, since for example, 10 kB cover already a silicon area of 11 mm2 in 0.5 xcexcm MIETEC CMOS triple layer metal technology. Off-chip memories, on the other hand, also represent a considerable cost, because read/write operations to and from external memory engender a power consumption that is typically orders of magnitude higher than the power consumption emanating from arithmetic operations. Furthermore, accesses to external memories are slow, compared to on-chip memory accesses, causing an impediment to the overall speed of the system. Real-time, power-efficient systems should therefore minimize the on-chip memory size and off-chip memory accesses.
Texture mapping on 3D objects in a virtual reality scene requires different texture resolutions, depending on the viewing distance. Current Discrete Cosine Transform (DCT) coding of textures only supports two levels of scalability (base layer+enhancement layer). Extending the number of resolution levels in a DCT scheme to more than two can be achieved with the multi-level Laplace Pyramid representation, at the expense of a 33% increase in the number of pixels to be coded. On the other hand, the wavelet texture coding, based on the Discrete Wavelet Transform (DWT), achieves an unlimited number of resolution levels, while providing excellent compression performance and is therefore better suited for applications requiring a large range of spatial scalability. FIGS. 1(a) and 2(a) show the algorithmic flow graphs of the multi-level DCT and Wavelet codings, respectively. Both schemes use essentially the same approach: a first stage transforms the image into a multi-resolution representation by successive filtering operations, and a second stage for the actual coding: parent-children coding for DWT, 8xc3x978 block-oriented transform (DCT)-coding for DCT. With reference to FIG. 1(a), in multi-level DCT coding the input image 10 is filtered in the first filtering step 1 to form a high pass subimage 4 and a low pass subimage 11. High pass subimage 4 is output to the interface memory (IM) 8. The low pass subimage 11 is filtered in the second level filtering step 2 to form a high pass subimage 5 and a low pass subimage 12. Each filtering step 1,2,3 outputs a high pass subimage 4,5,6 to the IM 8. The low pass subimage 13 from the last filtering step (the highest level) is also output to the IM 8. Parent-children trees are indicated at 7. The stored subimages are compressed by DCT compression circuits 9 to form the transmitted compressed image.
With reference to FIG. 2(a), in multi-level DWT coding input image 10 is filtered in the first step 31 to form four subimages 11, 34-36. These subimages are referred to as LL (11), LH (36), HL (35) and HH (34). The LL subimage 11 contains the low frequency image information from both the vertical and the horizontal wavelet convolutions. The LH and HL subimages 36, 35 contain information from the vertical and horizontal wavelet convolutions whereby in each subimage each direction takes a different one, of the high frequency and low frequency image informations. The HH 34 transform contains the high frequency image information from both the vertical and horizontal wavelet convolutions. The LL subimage 11 is filtered in the second filtering step 32 to again form four LL, HH, HL and LH subimages 12, 37, 38, 39 respectively. The LL image 13 from the last filtering step (in the last level) is stored in the IM 8. The subimages 34-42 in the three levels are stored in the IM 8 before being compressed by the compression circuits 43, 44 for the HL, LH and HH subimages 34-42 and the LL subimage 13 respectively. Parent-children trees are shown at 7.
Note that the DWT coding requires information throughout the levels of the multi-resolution representation, while the DCT coding codes the blocks in each level separately. However, the DCT decoding does require a parent-children tree approach for the decoder memory optimization: all the DCT blocks that after decoding correspond to one particular 8xc3x978 block in the decompressed image are preferably processed in the decoder simultaneously and should therefore be transmitted to the decoder as one cluster. Thus, the DCT encoding does not require the parent-children trees, but a memory optimized decoding process may exploit the data-dependencies of a parent-children tree. As a consequence, the data processing in the DWT and DCT encoders is essentially similar as seen from the memory optimization point of view: a successive filtering stage for obtaining the multi-resolution representation is followed by a coding stage with a parent-children data-dependency graph used at least in the decoding. Differences between the DCT and the DWT can be summarized as follows:
1. The parent-children data-dependency in the DCT codec is larger than in the wavelet codec: in the latter, the parent represents only one pixel, while in,the former, the parent extends over an 8xc3x978 block.
2. The DWT inherently uses the multi-resolution representation for the image coding, while in the scalable DCT coding, the multi-resolution representation is an awkward pre-processing step that does not prepare the actual coding stage, i.e. the inter-relation between the levels is not exploited.
3. The number of pixels increases with 33% in the multi-resolution representation of the DCT codec, compared to the original image size, while the multi-level wavelet transformed image has the same size as the input image.
4. The arithmetic complexity of the multi-level DWT is typically smaller than that of its DCT counterpart.
These reasons indicate that DCT coding is not optimal for scalable coding.
In many applications it would be desirable to be able to change the resolution of not only the whole but also a part of a transmitted image. For instance in medical diagnosis many parts of an X-ray image or photograph are irrelevant whereas certain areas maybe vitally important and require maximum resolution (preferably-loss-free) and size. Where these images are transmitted via a telecommunications network (e.g. via the Internet), the availability of multiresolutional part images creates a difficulty. It is desirable to transmit an image with a reasonable resolution and size quickly which allows the viewer to decide generally on the suitability or correctness of the image. This initial transmission is preferably carried out at a high data compression of the image so as to provide high-speed transmission. Subsequently, the image resolution is preferably increasable selectively, i.e. it should be possible to change the resolution of a local area of the image without introducing artefacts at the borders of the local area and the main image. Dividing the image into blocks and compressing the image using the Discrete Cosine Transform (DCT) provides a method of transmitting a low resolution image quickly, however, the subsequent high fidelity areas may suffer from block artefacts. Using the Discrete Wavelet Transform (DWT) each level of resolution may be transmitted separately. Maximum resolution requires transmitting all the data derived from the image to the destination which has the disadvantage that maximum resolution can only be obtained after waiting for everything to arrive although the method does have the advantage that subsequent image improvement may be carried out at the destination and does not require additional transmissions. No currently available system provides both features: rapid transmission of a low resolution image followed by transmission of a limited amount of data to provide quick and efficient loss-free display of selectable zones of the image.
T. C. Denk, K. K. Parhi describe in an article entitled: xe2x80x9cCalculation of minimum number of registers in 2-D discrete wavelet transforms using lapped block processing,xe2x80x9d IEEE Int. Symposium on Circuit and Systems, Vol. 3, pp. 77-80, London, England, May 1994 a technique for minimizing the on-chip memory requirements for the execution of the 2D wavelet transform iterative filtering process in a multi-processor architecture. No indication is given of how to adapt this technique to use less processors than one per level.
Aim of the Invention
It is an object of the present invention to provide a method and apparatus for efficient use of memory and/or memory accesses in the digital filtering of multi-dimensional data structures.
It is a further object of the present invention to provide a method and apparatus for digital filtering of multi-dimensional data structures which requires less processors than one per level.
It is still a further object of the present invention to provide a method and apparatus for digital filtering of multi-dimensional data structures which may be conveniently placed on a single chip.
The present invention may provide a method of multi-level iterative digital filtering of a data structure, whereby the elements of the data structure form the zero layer in the zero level and the data layer in each subsequent level is given by the results of one iteration, comprising the steps of: subdividing each level into a plurality of regions, there being data dependency between the data in one data layer in one level and the data layers in any other level of a region; filtering each level by lapped-region processing; and scheduling the data processing of each level to provide substantially regional synchronization of the filtering step at each level.
The present invention may also provide a method of multi-level iterative digital filtering of a data structure, whereby the elements of the data structure form the zero layer in the zero level and the data layer in each subsequent level is given by the results of one iteration, comprising the steps of: subdividing each level into a plurality of regions, there being data dependency between the data in one data layer in one level and the data layers in any other level of a region; filtering each level by lapped-region processing; and selecting the sequence for traversing the regions so that outputs from processing the regions are scheduled to occur at substantially equal time intervals.
The present invention may also provide a method of multi-level iterative digital filtering of a data structure, whereby the elements of the data structure form the zero layer in the zero level and the data layer in each subsequent level is given by the results of one iteration, comprising the steps of: subdividing each level into a plurality of regions, there being data dependency between the data in one data layer in one level and the data layers in any other level of a region; filtering each level by lapped-region processing; stopping the processing at the end of one region; and storing the data related to data dependencies included in adjacent unprocessed regions.
The present invention may also provide a filtering apparatus for multi-level iterative digital filtering of a data structure, whereby the elements of the data structure form the zero level and each subsequent level is defined by the results of one iteration, comprising: a control means for subdividing the data layer of each level into a plurality of regions, there being data dependency between the data in one data layer in one level and the data layers in any other level of a region; a filtering module for filtering each level by lapped-region processing, said filter module being adapted to schedule the data processing of each level to provide substantially regional synchronization of the filtering at each level.
The present invention may also provide a filtering apparatus for multi-level iterative digital filtering of a data structure, whereby the elements of the data structure form the zero level and each subsequent level is defined by the results of one iteration, comprising: a control means for subdividing the data layer of each level into a plurality of regions, there being data dependency between the data in one data layer in one level and the data layers in any other level of a region; a filtering module for filtering each level by lapped-region processing, said filter module being adapted to stop the processing at the end of one region and to store the data relating to data dependencies included in adjacent unprocessed regions.
Each of the above apparatuses may be used in an encoder. Further each of the above apparatuses may include means for individually carrying out any of the method steps of the appended method claims. Lapped regional processing may inlcude zero tree coding.
The present invention may further provide a filtering apparatus for multi-level iterative digital filtering of a multi-level representation of a data structure to reconstruct the data structure, the multi-level representation including data clusters, comprising: a filtering module for filtering the multi-level representation by lapped-cluster processing; a controller for controlling the flow of data through said filtering module, said controller being adapted to schedule the data processing in said filtering module so that substantially only the data which is required for reconstructing a region of the data structure is processed before beginning with the filtering process to reconstruct the next region of the data structure. The apparatus may be used in a decoder. A cluster may be a tree or part of a tree.
The present invention may provide a filtering apparatus for multi-level iterative digital filtering of a multi-level representation of a data structure to reconstruct the data structure, the multi-level representation including data clusters, comprising a filtering module for filtering the multi-level representation by lapped-cluster processing; a controller for controlling the flow of data through said filter module, said controller being adapted to stop the processing at the end of one region and to store the data relating to data dependencies included in adjacent non-reconstructed regions. The apparatus may be used in a decoder. Clusters can be trees or parts of trees.
Any of the above apparatuses (whether for encoding or decoding) may include at least one of an overlap memory, a tree memory and an inter pass memory.
The present invention may provide a method of multi-level iterative filtering of a multi-level representation of a data structure to reconstruct the data structure, the multi-level representation including data clusters, comprising the steps of: receiving the multi-level representation; filtering the representation by lapped cluster processing; scheduling the filtering process so that substantially only the data which is required for reconstructing a region of the data structure is processed before beginning with the filtering process to reconstruct the next region of the data structure.
The present invention may also provide a method and an apparatus for carrying out the method, for multi-level iterative filtering of a data structure in which the Lowpass and Highpass values of the iteration are created in couples, are treated as couples during arithmetic processing and are interleaved in memory, so that locality of reference is maintained.
The present invention may provide the advantage of the memory cost reduction obtained by algorithmic data reorganizations achieving a better behavioural match between the successive modules of the system. The inter-module buffer memories and their associated cost are therefore reduced. A reduction in memory size with one or two orders of magnitude can be obtained, while being very close to the minimal number of external (off-chip) memory accesses, ensuring high-speed, low-power capabilities. The present invention is particularly suitable for an application-specific, single-processor implementation of multi-resolution texture codecs.
The present invention may also provide a method and an apparatus for minimizing the memory size and access costs in a single-processor, scalable texture codec, which could be used in virtual world walkthroughs or facial animation scenes, e.g. in an MPEG-4 system. The present invention includes a method and an apparatus for carrying out the method, for optimising memory size and accesses during multi-level iterative filtering of a data structure, comprising the steps of:
subdividing the data structure into regions;
filtering the data structure by lapped region processing, the filtering step comprising the steps of:
determining which pixels in which levels of the multi-level iterative filtering are involved in the processing of a first region;
determining which pixels in which levels are involved in the processing of one or more second regions adjacent to the first region; and
temporarily storing information generated while processing the first region which is required for processing the second regions. Regional synchronisation may be seen as the process of clustering an optinised number of pixels in the relevant levels of the multi-level iteration at the relevant time stamps in such a way that memory size is minimised by reducing the total number of pixels involved in the processing of any region while reducing the number of memory acesses by jumping as little as possible from one level to another in one region and only storing the least possible amount of data relevant for the processing of any other region which is not the currently processed region thus avoiding recalculation of this data when the other region is processed.
The dependent claims define further individual embodiments of the present invention. The present invention, its advantages and embodiments will now be described with reference to the following drawings.