1. Field of the Invention
The present invention generally relates to a data processing apparatus and an operation method thereof, in particular, to an apparatus for concurrently realizing overlap filter and core transform and an operation method thereof.
2. Description of Related Art
Still image compression usually involves three main steps, namely, transform, quantization, and entropy coding. Referring to FIG. 1, the conventional JPEG standard employs Discrete Cosine Transform (DCT) in an individual transform unit of 8×8 block. Although the DCT transform with a favourable energy compression characteristic may obtain compression of approximate optimal data, a block effect cannot be prevented from occurring after a boundary tiled into blocks has been transformed.
Directed to this, Microsoft introduces a new still image compression format, i.e., an HD Photo format. At present, the format has been considered as a new JPEG international standard with a current name JPEG-XR. The HD Photo format employs lapped transform (LT) in a unit of 4×4 block, in order to reduce the block effect caused by individual block transform. More specifically, overlap filter is firstly performed on 4×4 blocks at the 4×4 block juncture, and then core transform is performed on the 4×4 blocks. The overlap filter and core transform both employ a lifting structure to ensure possibility of lossless compression.
FIG. 2 shows US Patent Application Publication No. 2006/013682 entitled “Reversible Overlap Operator for Efficient Lossless Data Compression,” which describes the HD Photo format introduced by Microsoft. For example, firstly, perform tiling on 2-dimensional (2-D) input data as shown in the figure, and perform lapped transform, such as filter transform of forward overlap shown in the figure, so as to reduce the block effect caused by the individual block transform. Then, perform block transform, i.e., HD Photo Core Transform (PCT), on originally tiled blocks, thereby obtaining a DC coefficient and fifteen AC coefficients. The HD Photo format adopts two-stage transform to collect the DC values again to form a block and perform the overlap filter transform and block transform again.
The aforementioned overlap filter transform and core transform both adopt a lifting structure to ensure the possibility of lossless compression. Since each step of the lifting structure is absolutely reversible, if an encoding process adopts a signal in a lossless compression transform field, a picture exactly like an original picture may be recovered by firstly performing reverse core transform in a decoding process and then performing reverse overlap filter transform. The RD Photo format may select whether to perform the first stage of overlap filter transform and the second stage of overlap filter transform. After the DC coefficient and AC coefficients undergo the processes of the quantization, entropy coding, and packetization, a compressed bitstream is obtained.
FIG. 3A is a schematic view of the overlap filter transform according to the HD Photo format. An image is firstly tiled into 4×4 blocks based on the lapped transform in a unit of 4×4 block used by the individual block, as shown by solid lines 310. Then, the overlap filter transform is performed on the junctures of the 4×4 blocks, for example, the 4×1 filter transform (4×1 filter 330 in FIG. 3A) is performed at the boundaries of the image, and the 4×4 filter transform, e.g., 4×4 filter 320, is performed inside the image.
Next, as shown in FIG. 3B, after the overlap filter transform, the core transform is further performed on the originally tiled 4×4 blocks (4×4 PCT as shown in FIG. 3B), and each 4×4 block may derive one DC value and fifteen AC values. The HD Photo adopts the two-stage transform to collect the DC values again to form 4×4 blocks and perform the overlap filter again. The HD Photo format may select whether to perform the first stage of overlap filter transform and the second stage of overlap filter transform.
Comparing with the conventional DCT, the lifting lapped transform need read/write data for many times in a more complicated manner. In order to solve this problem, Taiwan Patent Application No. 95128032 (corresponding to US Patent Publication No. 2007/0036225A1) has disclosed a method of re-arranging signals for the convenience of single-instruction multiple-data (SIMD) processor operation. As shown in FIGS. 3A and 3B, the different blocks after the 4×4 overlap filter and 4×4 core transform will overlap with 2×2 blocks. However, the method is more suitable for realizing a processor adopting the SIMD operation.
In the aforementioned conventional architecture, the core transform cannot be performed unless the overlap filter has been finished. As shown in FIGS. 3A and 3B, after the 4×4 and 4×1 overlap filter transform, the core transform is performed on the originally tiled 4×4 blocks. There is a need to improve the processing time and efficiency.