With the proliferation of Internet and the rapid deployment of multimedia information technology in e-commerce and across various business sectors, the cost of digital storage is being reduced at an unimaginable rate. It is now possible to store rich content information not only in text form, but also in the form of digital images, digital videos, 3-D computer graphics and many other digital data formats. After the arrival of the World-Wide-Web, information exchange and creation is under revolutionary change such that storehouses of Internet-accessible information are becoming increasingly common. One impact upon the whole sector of information related disciplines is the need for developing new technologies to handle, manage and archive the content of multimedia information with efficiency, effectiveness and robustness. During this process, millions of images, video clips could be handled, searched, indexed and retrieved, yet the majority of them could already be in compressed format. Amongst all compressed formats presently used, the discrete cosine transform (DCT) is widely adopted on the grounds that: (a) the discrete cosine transform is close to the optimal Karhunen-Loéve transform (KLT); (b) the DCT is signal independent and capable of eliminating the shortcoming of KLT; and (c) it has real coefficients, and fast algorithms are readily available for efficient implementation in practice. Hence, the DCT is widely used in image/video compression standards (JPEG/MPEG, and H261/H263) (see J. Jiang ‘A parallel algorithm for 4×4 DCT’ Journal of Parallel and Distributed Computing, Vol. 57, 1999, pp 257-269, ISDN: 0743-7315 and V. Bhaskaran and K. Konstanti, Image and Video Compression Standards: Algorithms and Architectures, Kluwer Academic Publications, Boston, 1997) Under this context, a new wave of research on image processing in compressed domain or content description inside data compression is being launched in the worldwide research community, as described in A. Abdel-Malek and J. E. Hershey, “Feature cueing in the discrete cosine domain”, Journal of Electronic Imaging, Vol. 3, pp. 71-80, January, 1994, B. Shen and Ishwar K. Sethi, “Direct feature extraction from compressed images”, SPIE: Vol. 2670 Storage & Retrieval for Image and Video Databases IV, 1996, and R. Reeve, K. Kubik and W. Osberger, “Texture characterization of compressed aerial images using DCT coefficients”, Proc. of SPIE: Storge and Retrival for Image and Video Databases V, Vol. 3022, pp. 398-407, February, 1997.
As will be apparent from the above, MPEG and JPEG encoding of moving and still images is well known in the art, and a common encoder and decoder architecture is shown in FIG. 1. To encode MPEG (and JPEG) images, the source pixel image is first split into 8×8 blocks which are then each subject to a discrete cosine transform (DCT). This results in 64 DCT coefficients, with the DC component in the top-left corner, and increasingly higher order AC components distributed outwards from the top-left throughout the pixel block. These components are then quantised using a binary code from a codebook to represent each of those real number DCT coefficients, and also to get rid of those coefficients whose values are below a quantization threshold, and are then converted to a serial data word by taking the coefficients in a zig-zag pattern (as shown in FIG. 2) so that the components which have been quantised to zero can be efficiently run-length encoded, followed by entropy encoded prior to transmission. This process is repeated for every 8×8 pixel block in the image.
To decode an image at the decoder, conventionally the reverse process has been performed i.e., entropy and run-length decode the received data, reconstruct the DCT coefficients in 8×8 blocks by reversing the zig-zag pattern of FIG. 2, and perform an Inverse DCT (IDCT) to obtain the actual pixel values. The inverse DCT step is computationally intensive, requiring time and power to perform. Typically, for standard full decompression (IDCT) the computing cost is 4096 multiplications and 4032 additions, although some practical implementations may only require 1024 multiplications and 896 additions per 8×8 pixel block (see table in our “Results” section for further comparison).
Although the area of image processing has been a focus of research and development for many years (typified by tasks such as enhancement, segmentation, feature extraction, and pattern classification etc) such development has all been in the pixel-domain. While DCT-based data compression greatly improves the transmission efficiency and the management of limited storage space, compressed visual data has to be processed back to the pixel domain before being displayed, further processed, or printed. Some of the frequently employed processing functions include scaling, filtering, rotation, translation, feature extraction and classification. To this end, conventional approaches have to convert (decompress) the data from the DCT domain to the pixel domain before such existing algorithms can be applied. Such processing leads to significant increases in overhead computing cost and storage expense in the entire chain of image processing and compression, as will be apparent from the computational intensity figures given earlier. There is therefore a strong need within the industry for a less computationally intensive means of allowing mass image media to be processed, without having to fully decompress each image using an IDCT operation whenever an image operation (searching, filtering, displaying etc.) has to be performed.