As interest and advances in the field of digital picture processing continue to increase, so does the need for performing image coding in an efficient manner. A general aspect of image coding is the transmission of pictures over digital communication mediums using bandwidth reduction techniques, like for example, image compression where only as few bits as needed are utilized. The discrete cosine transform (DCT) has emerged as a data compression tool in image coding, as it is especially useful for obtaining picture quality according to various image compression standards, for storing and retrieving images in digital storage media, and for real time image processing with VLSI implementations. One ongoing problem in implementing DCT is that it is computationally intensive, and thereby dependent upon a large number of multiplication operations along with associated hardware that is also large and complex in design and implementation.
The large number of multipliers required for conventional DCT implementations are particularly problematic for the minimal placement and routing requirements of VLSI, ASIC and System-on-Chip (SoC) applications being used in increasing smaller and more streamlined multimedia-based devices and appliances. As many hand-held portable devices equipped to handle multimedia video formats continue to become increasingly smaller in size, it would be ideal if the video compression hardware for providing DCT processing were streamlined, that is, with logic designed so that the number of necessary multipliers are reduced, and so that the routing, placement, and layout of logic and circuit components are compact, have an uncomplicated design, but are enabled to accommodate complex calculations.
Additionally, conventional DCT techniques also are also problematic in that the errors generated with floating point results may be large. One factor contributing to this drawback stems from prior art approaches adopting 12-bits internal address precision. Frequently, stages of additional multipliers are needed to improve the precision of the DCT and inverse DCT (IDCT) results, but this just increases the size and area of the chip, and results in increased power consumption. Accordingly, there is a need for an approach that improves the precision of the DCT results without these drawbacks.
Also, many proposed image compression standards have DCT-based algorithms that require the IDCT (DCT−1). The additional circuitry required for IDCT becomes problematic in keeping chip size as small as possible. It would therefore be beneficial if there were a DCT hardware design where the same architecture could be shared to perform dual functions in the nature of DCT and IDCT. Such a design would streamline the chip area required for DCT and IDCT processing.