The present invention generally relates to the 2-dimensional (2-D) discrete cosine transform (DCT), and more particularly to implementing the 2-D forward and inverse DCT on an FPGA using a polynomial transform.
The 2-D DCT is at the heart of many low-rate codecs for video compression. For example, the DCT is an integral part of the MPEG and H.261 standards. The DCT""s time efficient computation is also of great interest in various communications and multi-media applications.
There are several strategies available to the digital signal processing (DSP) system engineer for realizing a DCT-based codec. One option is to use a software programmable DSP processor such as the TMS320C5x from Texas Instruments. This brings high flexibility to a design at the expense of performance. At the other end of the implementation spectrum is an ASIC solution, which provides high performance with little or no flexibility. A third option includes field programmable gate arrays (FPGAs).
FPGAs offer high-performance without sacrificing design flexibility The conventional technique for realizing a 2-D DCT is to exploit the transform separability and decompose the problem into a sequence of 1-D sub-problems. That is, first a 1-D DCT is performed on the rows, followed by a 1-D DCT on the, columns. For high-resolution Nxc3x97N-pixel (N greater than =1024) color images, a parallel architecture that incorporates row and column, processors as well as a matrix transposition engine must be used to accommodate real-time data rates. Using distributed arithmetic (as described in U.S. Pat. No. 3,77,130 entitled, xe2x80x9cDigital Filter for PCM Encoded Signalsxe2x80x9d to Croisier et al.) to implement a 1-D DCT on an FPGA can greatly reduce the number of configurable logic blocks (CLB s) used for the DCT.
While distributed arithmetic reduces the number of CLBs of an FPGA that are used to implement the 2-D DCT, it is desirable for economic reasons to further reduce the number of CLB used to implement the 2-D DCT.
The present invention includes a circuit arrangement that implements a 2-D forward and inverse DCT using a polynomial transform. In one embodiment, an input permutation processor reorders input sample data, wherein the reordered data samples logically form a matrix. A plurality of 1-D DCT processors are coupled to the input permutation processor. Extended diagonals passing through the matrix reference data that are provided as input to respective 1-D DCT processors, which operate in parallel. Output data from the 1-D DCT processors are provided as input data to a polynomial transform processor. The polynomial transform processor applies a polynomial transform to data from the 1-D DCTs, and output data from the polynomial transform processor are re-ordered in a prescribed order for 2-D DCT outputs.
In another embodiment the various processors and storage elements are implemented on FPGA function generators. Since the 1-D DCT processors and polynomial transform are multiplier free, usage of FPGA resources is minimized.
Various embodiments are set forth in the Detailed Description and Claims which follow.