1. Field of the Invention
This invention relates generally to calculating the 2-Dimensional 8xc3x978 Discrete Cosine Transform (2-D DCT) and the Inverse Discrete Cosine Transform (2-D IDCT), and its very large scale integrated (VLSI) implementation. Specifically, the present invention is well suited to meet the real time digital processing requirements of digital High-Definition Television (HDTV)
2. Related Art
1.0 Overview of Video Coding and MPEG Implementations
1.1 Video Compression
1.2 MPEG Video Compression: A Quick Look
1.2.1 MPEG Video Sequences, Groups and Pictures
1.2.2 MPEG Video Slice, Macroblock and Block
1.2.3 The Motion Estimation/Compensation in MPEG
1.2.4 The Discrete Cosine Transform in MPEG
1.2.5 The Quantization in MPEG
1.2.6 The Zigzag Scan and Variable Length Coding in MPEG
1.2.7 MPEG Video Encoding Process
1.2.8 MPEG Video Decoding Process
1.3 MPEG-1 Video Standard
1.4 MPEG-2 Video Standard
1.4.1 Fields, Frames and Pictures
1.4.2 Chrominance Sampling
1.4.3 Scalability
1.4.4 Profiles and Levels
1.5 Hybrid Implementation Scheme for MTEG-2 Video System
2.0 DCT/IDCT Algorithms and Hardware Implementations
2.1 Introduction
2.2 1-D DCT/IDCT Algorithms and Implementations
2.2.1 Indirect 1-D DCT via Other Discrete Transforms
2.2.2 1-D DCT via Direct Factorizations
2.2.3 1-D DCT Based on Recursive Algorithms
2.2.4 1-D DCT/IDCT Hardware Implementations
2.3 2-D DCT/IDCT Algorithms and Implementations
2.3.1 2-D DCT via Other Discrete Transforms
2.3.2 2-D DCT by Row-Column Method (RCM)
2.3.3 2-D DCT Based on Direct Matrix Factorization/Decomposition
2.3.4 2-D DCT/IDCT Hardware Implementations
2.4 Summary
In this section, a brief overview of video compression, Moving Pictures Experts Group (MPEG) video protocols and different implementation approaches are presented. A list of references cited in this application is included in an Appendix. Each of these reference listed is incorporated herein by reference in its entirety.
The reduction of transmission and storage requirements for digitized video signals has been a research and development topic all over the world for more than 30 years.
Many efforts have been made trying to deliver or store digital television signals, which have a bit-rate of more than 200 Mbit/s in an uncompressed format and must be brought down to a level that can be handled economically by current video processing technology. For example, suppose the pictures in a sequence are digitized as discrete grids or arrays with 360 pels (picture elements) per raster line, 288 lines/picture (a typical resolution for MPEG-1 video compression), three-color separation and sampled with 8-bit precision for each color, the uncompressed video sequence at 24 pictures/second is roughly 60 Mbit/s, and a one-minute video clip requires 448 Mbytes of storage space.
The International Standardization Organization (ISO) started its moving picture standardization process in 1988 with a strong emphasis on real-time decoding of compressed data stored on digital storage devices. A Moving Pictures Experts Group (MPEG) was formed in May 1988 and a consensus was reached to target the digital storage and real-time decoding of video with bit-rates around 1.5 Mbit/s (MPEG-1 protocol) [MPEG1]. At the MPEG meeting held in Berlin, Germany on December 1990, a MPEG-2 proposal was presented that primarily targeted for higher bit-rates, larger picture sizes, and interlaced frames. The MPEG-2 proposal attempted to address a much more broader set of applications than MPEG-I (such as television broadcasting, digital storage media, digital high-definition TV (HDTV) and video communication) while maintaining all of the MPEG-1 video syntax. Moreover, extensions were adopted to add flexibility and functionality to the standard. Most importantly, a spatial scalable extension was added to allow video data streams with multiple resolutions to provide support for both normal TV and HDTV. Other scalable extensions allow the data stream to be partitioned into different layers in order to optimize transmission and reception over existing and future networks [MPEG2].
An overview of MPEG video compression techniques, MPEG-1""s video layers and MPEG-2""s video layers is presented in section 1.2, 1.3 and 1.4, respectively. A proposed hybrid implementation scheme for MPEG-2 video codec is shown in section 1.5. An outline of rest of the thesis is presented in section 1.6.
An MPEG video codec specifically designed for compression of video sequences. Because a video sequence is simply a series of pictures taken at closely spaced time intervals, these pictures tend to be quite similar from each other except for when a scene change takes place. The MPEG1 and MPEG2 codecs are designed to take advantage of this similarity using both past and future temporal information (inter-frame coding). They also utilize commonality within each frame, such as a uniform background, to lower the bit-rate (intra-frame coding) [MPEG1, MPEG2].
1.2.1 MPEG Video Sequences, Groups and Pictures
An MPEG video sequence is made up of individual pictures occurring at fixed time increments. Except for certain critical timing information in the MPEG systems layers, an MPEG video sequence bitstream is completely self-constrained and is independent of other video bitstreams.
Each video sequence is divided into one or more groups of pictures, and each group of pictures is composed of one or more pictures of three different types: I-, P- and B-type. I-pictures (intra-coded pictures) are coded independently, entirely without reference to other pictures. P- and B-pictures are compressed by coding the differences between the reference picture and the current one, thereby exploiting the similarities from the current to reference picture to achieve high compression ratio. One example of a typical MPEG I-, P- and B-pictures arrangement in display order is illustrated in FIG. 1.
The first coded picture in each video sequence must be an I-picture. I-pictures may be occasionally inserted in different positions of a video sequence to prevent the coding error propagation. For I-pictures, the coding method used by MPEG is similar to that defined by JPEG [JPEG].
P-pictures (predictive-coded pictures) obtain predictions from temporally preceding I- or P-pictures in the sequence and B-pictures (bidirectionally predictive-coded pictures) obtain predictions from the nearest preceding and/or upcoming I- or P-pictures in the sequence. B-pictures may predict from preceding pictures, upcoming pictures, both, or neither. Similarly, P-pictures may predict from a preceding picture or use intra-coding.
A given sequence of pictures is encoded in a different order which they are displayed when viewing the sequence. An example of the encoding sequence of MPEG I-, P- and B-pictures is illustrated in FIG. 2.
Each component of a picture is made up of a two-dimensional (2-D) array of samples. Each horizontal line of samples in this 2-D grid is called a raster line, and each sample in a raster line is a digital representation of the intensity of the component at that point on the raster line. For color sequences, each picture has three components: a luminance component and two chrominance components. The luminance provides the intensity of the sample point, whereas the two chrominance components express the equivalent of color hue and saturation at the sample point. They are mathematically equivalent to RGB primaries representation but are better suited for efficient compression. RGB can be used if less efficient compression is acceptable.
The equivalent counterpart of a picture in broadcast video (for example analog NTSQ) is a frame, which is further divided into two fields. Each field has half the raster lines of the full frame and the fields are interleaved such that alternate raster lines in the frame belong to alternate fields.
1.2.2 MPEG Video Slice, Macroblock and Block
The basic building block of an MPEG picture is the macroblock. The macroblock consists of one 16xc3x9716 array of luminance samples plus one, two or four 8xc3x978 blocks of samples for each of two the chrominance components. The 16xc3x9716 luminance array is actually composed of four 8xc3x978 blocks of samples. The 8xc3x978 block is the unit structure of the MPEG video codec and is the quantity that is processed as an entity in the codec.
Each MPEG picture is composed of slices, where each slice is a contiguous sequence of macroblocks in raster scan order. The slice starts at a specific address or position in the picture specified in the slice header. Slices can continue from one macroblock row to the next inxe2x80x94MPEGxe2x80x94I, but not in MPEG-2.
1.2.3 The Motion Estimation/Compensation in MPEG
If there is motion in the sequence, a better prediction is often obtained by coding differences relative to reference areas that are shifted with respect to the area being coded; a process known as motion compensation. The process of determining the motion vectors in the encoder is called motion estimation, and the unit area being predicted is a macroblock.
The motion vectors describing the direction and amount of motion of the macroblocks are transmitted to the decoder as part of the bitstream. The decoder then knows which area of the reference picture was used for each prediction, and sums the decoded difference with this motion compensated prediction to get the final result. The encoder must follow the same procedure when the reconstructed picture will be used for predicting other pictures. The vectors are the same for each pel in a same macroblock, and the vector precision is either a full pel or a half-pel accuracy.
1.2.4 The Discrete Cosine Transform in MPEG
The discrete cosine transform (DCT) is the critical part of both intra and inter coding for MPEG video compression. The DCT has certain properties that simplify coding models and make the coding more efficient in terms of perceptual quality measures.
Basically, the DCT is a method of decomposing the correlation of a block of data into the spatial frequency domain. The amplitude of each data in the spatial (coefficient) domain represents the contribution of that spatial frequency pattern in the block of data being analyzed. If only the low-frequency DCT coefficients are nonzero, the data in the block vary slowly with position. If high frequencies are present, the block intensity changes rapidly from pel to pel.
1.2.5 The Quantization in MPEG
When the DCT is computed for a block of pels, it is desirable to represent the high spatial frequency coefficients with less precision and the low spatial frequency ones with more precision. This is done by a process called quantization. A DCT coefficient is quantized by dividing it by a nonzero positive integer called a quantization value and rounding it to the nearest integer. The bigger the quantization value is, the lower the precision is of the quantized DCT coefficient. Lower-precision coefficients can be transmitted or stored with fewer bits. Generally speaking, the human eye is more sensitive to lower spatial frequency effects than higher ones, which is why the lower frequencies are quantized with higher precision.
As noted above, a macroblock may be composed of four W blocks of luminance samples and two 8xc3x978 blocks of chrominance samples. A lower resolution is used here for the chrominance blocks because the human eye can resolve higher spatial frequencies in luminance than in chrominance.
In intra coding, the DCT coefficients are almost completely decorrelatedxe2x80x94that is, they are independent of one another, and therefore can be coded independently. Decorrelation is of great theoretical and practical interest in terms of construction of the coding model. The coding performance is also actually influenced profoundly by the visually-weighted quantization.
In non-intra coding, the DCT does not greatly improve the decorrelation, since the difference signal obtained by subtracting the prediction from the reference pictures is already fairly well decorrelated. However, quantization is still a powerful compression technique for controlling the bit-rate, even if decorrelation is not improved very much by the DCT.
Since the DCT coefficient properties are actually quite different for intra and inter pictures, different quantization tables are used for intra and inter coding.
1.2.6 The Zigzag Scan and Variable Length Coding in MPEG
The quantized 2-D DCT coefficients are arranged according to a 1-D sequence known as the zigzag scanning order. In most case, the scan orders the coefficients in ascending spatial frequencies, which is illustrated in FIG. 1.3. By using a quantization table which strongly deemphasizes higher spatial frequencies, only a few low-frequency coefficients are nonzero in a typical block which results in a very high compression.
After the quantization, the 1-D sequence is coded losslessly so that the decoder can reconstruct exactly the same results. For MPEG, an approximately optimal coding technique, based on Huffman coding, is used to generate the tables of variable length codes needed for this task. Variable length codes are needed to achieve good coding efficiency, as very short codes must be used for highly probable events. The run-length-coding and some special defined symbols (such as end-of-block, EOB) permit efficient coding of DCTs with mostly zero coefficients.
1.2.7 MPEG Video Encoding Process
The MPEG video encoding is a process that reads a stream of input picture samples and produces a valid coded bitstream as defined in the specification. The high-level coding system diagram shown in FIG. 4 illustrates the structure of a typical encoder system 400. The-MPEG video divides the pictures in a sequence into three basic categories: I-, P- and B-pictures as described previously.
Since I-pictures are coded without reference to neighboring pictures in the sequence, the encoder only exploits the correlation within the picture. The incoming picture 405 will go directly through switch 410 into 2-D DCT module 420 to get the data in each block decomposed into underlying spatial frequencies. Since the response of the human visual system is much more sensitive to low spatial frequencies than high ones, the frequencies are quantized with a quantization table with 64 entries in quantization module 430, in which each entry is a function of spatial frequency for each DCT coefficient. In zigzag scan module 470, the quantized coefficients are then arranged qualitatively from low to high spatial frequency following a exact same or similar zigzag scan order shown in FIG. 3. The rearranged 1-D sequence data is further processed with an entropy coding (Huffman coding) scheme to achieve further compression. Simultaneously, the quantized coefficients are also used to reconstruct the decoded blocks using inverse quantization (module 440) and an inverse 2-D DCT (reconstruction module 450). The reconstructed blocks stored in frame store memory 455 is used as references for future differential coding for P- and B-pictures.
In contrast, P- and B-pictures are coded as the differences between the current macroblocks and the ones in preceding and/or upcoming reference pictures. If the image does not change much from one picture to the next, the difference will be insignificant and can be coded very effectively. If there is motion in the sequence, a better prediction can be obtained from pels in the reference picture that are shifted relative to the current picture pels (see, motion estimation module 460). The differential results will be further compressed by a 2-D DCT, quantization, zigzag and variable length coding modules (420, 430, 470, 480) similar to the I-picture case. Although the decorrelation is not improved much by the DCT for the motion compensated case, the quantization is still an effective way to improve the compression rate. So MPEG""s compression gain arises from three fundamental principles: prediction, decorrelation, and quantization.
1.2.8 MPEG Video Decoding Process
The MPEG video decoding process, which is the exact inverse of the encoding process, is shown in FIG. 5. The decoder 500 accepts the compressed video bitstream 485 generated from MPEG video encoder 400 and produces output pictures 565 according to MPEG video syntax.
The variable length decoding and inverse zigzag scan modules (51-, 520) reverse the results of the zigzag and variable length coding to reconstruct the quantized DCT coefficients. The inverse quantization and inverse 2-D DCT modules (530, 540) are exact the same modules as those in the encoder. The motion compensation in motion compensation module 550 will only be carried out for nonintra macroblocks in P- and B-pictures.
The MPEG-1 video standard is primarily intended for digital storage applications, such as compact disk (CD), DAT, and magnetic hard disks. It supports a continuous transfer rate up to 1.5 Mbit/s, and is targeted for non-interlaced video formats having approximately 288 lines of 352 pels and picture rates around 24 Hz to 30 Hz. The coded representation of MPEG-1 video supports normal speed forward playback, as well as special functions such as random access, fast play, fast reverse play, normal speed reverse playback, pause, and still pictures. The standard is compatible with standard 525 and 625-line television formats, and it provides flexibility for use with personal computer and workstation displays [MPEG1].
Each picture of MPEG-1 consists of three rectangular matrices of eight-bit numbers: a luminance matrix (Y) and two chrominance matrices (Cb and Cr). The Y-matrix must have an even number of rows and columns and the Cb and Cr matrices are one half the size of the Y-matrix in both horizontal and vertical dimensions.
The MPEG-1 video standard uses all the MPEG video compression concepts and techniques listed in section 1.2. The MPEG-1 video standard only defines the video bitstream, syntax and decoding specifications for the coded video bitstream, and leaves a number of issues undefined in the encoding process.
The MPEG-2 video standard evolved from the MPEG-1 video standard and is aimed at more diverse applications such as television broadcasting, digital storage media, digital high-definition television (HDTV), and communication [MPEG2].
Additional requirements are added into the MPEG-2 video standard. It has to work across asynchronous transfer mode (ATM) networks and therefore needs improved error resilience and delay tolerance. It has to handle more programs simultaneously without requiring a common time base. It also has to be backwards compatible with the MPEG-1. Furthermore it is also targeted to code interlaced video signals, such as those used by the television industry. Much higher data transfer rates can be achieved by the MPEG-2 system.
As a continuation of the original MPEG-1 standard, MPEG-2 borrows a significant portion of its technology and terminology from MPEG-1. Both MPEG-2 and MPEG-1 use the same layer structure concepts (i.e. sequence, group, picture, slice, macroblock, block, etc.). Both of them only specify the coded bitstream syntax and decoding operation. Both of them invoke motion compensation to remove the temporal redundancies and use the DCT coding to compress the spatial information. Also, the basic definitions of I-, P- and B-pictures remain the same in both standards. However, the fixed eight bits of precision for the quantized DC coefficients defined in the MPEG-1 is extended to three choices in the MPEG-2: eight, nine and ten bits.
1.4.1 Fields, Frames and Pictures
At the higher bit-rates and picture rates that the MPEG-2 video targets, fields and interlaced video become important. The MPEG-2 video types are expanded from MPEG-1""s I-, P- and B-pictures to I-field picture, I-frame picture, Meld picture, P-frame picture, B-field picture, and B-frame picture.
In an interlaced analog frame composed of two fields, the top field occurs earlier in time than the bottom field. In MPEG-2, coded frames may be composed of any adjacent pairs of fields. A coded I-frame may consist of a I-frame picture, a pair of I-field pictures, or an I-field picture followed by a Meld picture. A coded P-frame may consist of a P-frame picture or a pair of Meld pictures. A coded B-frame may consist of a B-frame picture or a pair of B-field pictures. In contrast to MPEG-I that allows only progressive pictures, MPEG-2 allows both interlaced and progressive pictures.
1.4.2 Chrominance Sampling
Comparing with MPEG-1""s single chrominance sampling format, MPEG-2 defines three chrominance sampling formats. These are labeled 4:2:0, 4:2:2 and 4:4:4.
For 4:2:0 format, the chrominance is sampled 2:1 horizontally and vertically as in MPEG-1. For 4:2:2 format, the chrominance is subsampled 2:1 horizontally but not vertically. For 4:4:4 format, the chrominance has the same sampling for all three components and the decomposition into interlaced fields is the same for all three components.
1.4.3 Scalability
In order to cope with services like asynchronous transfer mode (ATM) networks and HDTV with conventional TV backward compatibility, more than one level of resolution and display quality are needed in the MPEG-2 video standard. MPEG-2 has several types of scalability enhancements that allow low-resolution or smaller images to be decoded from only part of the bitstream. MPEG-2 coded images can be assembled into several layers. The standalone base layer may use the nonscalable MPEG-1 syntax. One or two enhancement layers are then used to get to the higher resolution or quality. This generally requires fewer bits than independent compressed images at each resolution and quality, and at the same time achieve higher error resilience for network transmission.
There are four different scalability schemes in the MPEG-2 standard:
SNR scalability uses the same luminance resolution in the lower layer and a single enhancement layer. The enhancement layer contains mainly coded DCT coefficients and a small overhead. In high-error transmission environments, the base layer can be protected with good error correcting techniques, while the enhancement layer is allowed to be less resilient to errors.
Spatial scalability defines a base layer with a lower resolution and adds an enhancement layer to provide the additional resolution. In the enhancement layer, the difference between an interpolated version of the base layer and the source image is coded in order to accommodate two applications with different resolution requirements like conventional TV and HDTV.
Temporal scalability provides an extension to higher temporal picture rates while maintaining backward compatibility with lower-rate services. The lower temporal rate is coded by itself as the basic temporal rate. Then, additional pictures are coded using temporal prediction relative to the base layer. Some systems may decode both layers and multiplex the output to achieve the higher temporal rate.
Data partitioning split the video bitstream into two channels: the first one contains all of the key headers, motion vectors, and low-frequency DCT coefficients. The second one carries less critical information such as high frequency DCT coefficients, possibly with less error protection.
1.4.4 Profiles and Levels
Profiles and levels provide a means of defining subsets of the syntax and semantics of MPEG-2 video specification and thereby give the decoder the information required to decode a particular bitstream. A profile is a defined sub-set of the entire MPEG-2 bitstream. A level is a defined set of constraints imposed on parameters in the bitstream.
MPEG-2 defines five distinct profiles: simple profile (SP), main profile (MP), SNR scalable profile (SNR), spatial scalable profile (SPT) and high profile (HP). Four levels are also defined in MPEG-2: low (LL), main (ML), high-1440 (H-14) and high (HL) to put constraints on some of the parameters in each profile because the parameter ranges are too large to insist on compliance over the full ranges even with the four profile subsets defined in the MPEG-2 video syntax. Only some of combinations among the profiles and levels are valid. The permissible level combinations with the main profile and their parameter values are listed in Table 1.1.
The permissible level/layer combinations with high profile and their parameter values are listed in Table 1.2.
From FIGS. 4 and 5 in section 1.2 one can see that the encoding and decoding systems for MPEG video consist of several function modules. The modules can be classified by their computational requirements:
(1) A vast amount of the operations are paralleled ones in nature and are best suitable for implementation on a parallel structured hardware component. These modules include the 2-D DCT, 2-D IDCT in both encoding/decoding processes, motion estimation, and motion compensation modules.
(2) The computations carried out are serial in nature and can only be carried out with a serial structure. These modules include zigzag scan, inverse scan, variable length coding and variable length decoding modules.
(3) The computations carried out are parallel in nature, but they can be easily carried out with serial structure without suffering much performance penalty. These modules include quantization and inverse quantization modules.
So far there has been a lot of different approaches for the implementing a of MPEG video encoding/decoding system. Table 1.3 gives a brief summary of some MPEG-1 and MPEG-2 video codec system implementations from some major video vendors [Joa96].
One can see from Table 1.3 that all the MPEG-2 video codec implementations so far have been limited to main level MP@ML. For the MPEG-2 encoding process, the biggest obstacles for real-time encoding are motion estimation and 2-D DCT/IDCT. For the decoding process the 2-D IDCT is the most computation intensive task that every real-time decoding scheme needs to overcome. The huge amount of computations required by motion estimation and 2-D DCT/IDCT prevent the current hardware and software implementations of MPEG-2 video to move from MP@ML to higher levels. Table 1.4 shows just how computational intensive the 2-D IDCT is for MPEG-2 video decoding process.
From Table 1.4, it is clear that the number of 2-D IDCTs in the decoding process will increase from only 486,000 8xc3x978 blocks per second for MP@ML to 2,937,600 blocks per second for MP@HL. Considering that most 8xc3x978 2-D IDCT chips developed so far can carry out about 1,500,000 block transforms per second, and that the most powerful video digital signal processor (DSP) chip (TMS320C80 by TI) can only carry out 800,000 8xc3x978 2-D IDCT per second [May95], a challenge for providing real-time MPEG-2 High-level hardware exists.
In Section 2, existing 1-D DCT/1DCT and 2-D DCT IDCT algorithms, as well the hardware implementation of these algorithms are reviewed. It is shown that all the existing 2-D DCT/1DCT chip implementations have made use of the separability property of the 2-D DCT/IDCT since very simple communication interconnection can be achieved by this approach. The algorithms that require fewer multiplications through direct matrix factorization/decomposition are not necessarily suitable for hardware implementation. Instead, the regularity of design and feasibility of layout implied by the row-column method seem to be the main concern for chip implementation.
In this section, some of the most commonly used one-dimensional and two-dimensional Discrete Cosine Transform (DCT) and Inverse Discrete Cosine Transform (EDCT) algorithms are evaluated. Detailed implementation schemes of some algorithms are also presented.
The development of fast algorithms for the Discrete Fourier Transform (DFT) by Cooley and Tukey [CT65] in 1965 has led to phenomenal growth in its applications in digital signal processing. Similarly, the discovery of the Discrete Cosine Transform (DCT) in 1974 [ANR74] and its potential applications have caused a significant impact in audio and video signal processing. Since 1974, the DCT/IDCT have been widely used in the image and speech data analysis, recognition and compression. They have become an integral part of several standards such as JPEG, MPEG, CCITT Recommendation H.261 and other video conference protocols.
A lot of fast algorithms and hardware architectures have been introduced for one-dimensional (1-D) and two-dimensional (2-D) DCT/IDCT computation. In section 2.2, an overview of major one-dimensional DCT/IDCT algorithms is presented. In section 2.3, the focus is on two-dimensional DCT/IDCT methods and their implementations. A summary is presented in section 2.4. Some typical methods to demonstrate how the 1-D DCT or 2-D DCT computation can be simplified are discussed. These methods can also apply to the 1-D IDCT or 2-D IDCT computation in general.
Given N data point x(0),x(1), . . . , x(Nxe2x88x921), the 1-D N-point DCT and IDCT(or DCT-II and IDCT-II defined by Wang) are defined as [Wan84]:                               X          ⁡                      (            k            )                          =                                            2              N                                ⁢                      C            ⁡                          (              k              )                                ⁢                                    ∑                              n                =                0                                            N                -                1                                      ⁢                                          x                ⁡                                  (                  n                  )                                            ⁢              cos              ⁢                                                                    (                                                                  2                        ⁢                        n                                            +                      1                                        )                                    ⁢                  k                  ⁢                                      xe2x80x83                                    ⁢                  π                                                  2                  ⁢                                      xe2x80x83                                    ⁢                  N                                                                                        (        2.1        )                                          x          ⁡                      (            n            )                          =                                            2              N                                ⁢                                    ∑                              k                =                0                                            N                -                1                                      ⁢                                          C                ⁡                                  (                  k                  )                                            ⁢                              X                ⁡                                  (                  k                  )                                            ⁢              cos              ⁢                                                                    (                                                                  2                        ⁢                        n                                            +                      1                                        )                                    ⁢                  k                  ⁢                                      xe2x80x83                                    ⁢                  π                                                  2                  ⁢                                      xe2x80x83                                    ⁢                  N                                                                                        (        2.2        )            
where       C    ⁡          (      k      )        =      {                                                      1              /                              2                                      ,                                                k            =            0                                                            1            ,                                    otherwise                    
for k=0, 1, . . . , Nxe2x88x921.
Intrinsically, for N-point data sequences, both 1-D DCT and 1-D IDCT require N2 real multiplications and N(Nxe2x88x921) real additions/subtractions. In order to reduce the number of multiplications and additions/subtractions required, various fast algorithms have been developed for computing the 1-D DCT and 1-D IDCT. The development of efficient algorithms for the computation of DCT/IDCT began immediately after Ahmed et al. reported their work on the DCT [ANR74].
One initial approach for the computation of DCT/IDCT is via Fourier Cosine Transform and its relations to the Discrete Fourier Transform (DFT) were exploited in the initial developments of its computational algorithms. The approach of computing the DCT/IDCT indirectly using the FFT is also borrowed by other researchers to obtain fast DCT/IDCT algorithms via other kinds of discrete transforms (such as Walsh-Hadamard Transform, Discrete Hartley Transform, etc.).
In addition, fast DCT/IDCT algorithms can also be obtained by direct factorization of the DCT/IDCT coefficient matrices. When the components of this factorization are sparse, the decomposition represents a fast algorithm. Since the factorization is not unique, there exist a lot of different forms of fast algorithms. The factorization schemes often fall into the decimation-in-time (DIT) or the decimation-in-frequency (DEF) category [RY90].
Furthermore, there also exist other approaches to develop fast DCT/IDCT algorithms. The fast computation can be obtained through recursive computation [WC95, AZK95], planar rotations [LLM89], prime factor decomposition [YN85], filter-bank approach [Chi94] and Z-transform [SL96], etc.
2.2.1 Indirect 1-D DCT via Other Discrete Transforms
The Fourier Cosine Transform can be calculated using the Fourier Transform of an even function. Since there exist a lot of Fast Fourier Transform (FFT) algorithms, it is natural to first look at the existing FFT algorithms to compute DCT.
Let x(0),x(1), . . . , x(Nxe2x88x921) be a given sequence. Then an extended sequence {y(n)}, which is symmetric about the (2Nxe2x88x921)/2 point, can be constructed as [RY90]:                                                                                           y                  ⁡                                      (                    n                    )                                                  =                                  x                  ⁡                                      (                    n                    )                                                                                                                          =                                  x                  ⁡                                      (                                                                  2                        ⁢                        N                                            -                      n                      -                      1                                        )                                                                                      ⁢                  xe2x80x83                ⁢                                                                              n                  =                  0                                ,                1                ,                …                ⁢                                  xe2x80x83                                ,                                  N                  -                  1                                                                                                                          n                  =                  N                                ,                                  N                  +                  1                                ,                …                ⁢                                  xe2x80x83                                ,                                                      2                    ⁢                                          xe2x80x83                                        ⁢                    N                                    -                  1                                                                                        (        2.3        )            
Since the N-point Discrete Fourier Transform is defined as:                                           Z            ⁡                          (              k              )                                =                                    ∑                              n                =                0                                            N                -                1                                      ⁢                                          z                ⁡                                  (                  n                  )                                            ⁢                              W                N                nk                                                    ,                  k          =          0                ,        1        ,        …        ⁢                  xe2x80x83                ,                  N          -          1                                    (        2.4        )            
The 2N-point sequence {y(n)} defined above can be used to calculate the 2N-point DFT as:                                           Y            ⁡                          (              k              )                                =                                    ∑                              n                =                0                                                              2                  ⁢                  N                                -                1                                      ⁢                                          y                ⁡                                  (                  n                  )                                            ⁢                              W                                  2                  ⁢                  N                                nk                                                    ,                  xe2x80x83                ⁢                  k          =          0                ,        1        ,        …        ⁢                  xe2x80x83                ,                              2            ⁢            N                    -          1                                    (        2.5        )            
where W2N denotes exp(xe2x88x92j2xcfx80/2N). The above formula can be easily decomposed to                                                                         Y                ⁡                                  (                  k                  )                                            =                                                                    ∑                                          n                      =                      0                                                              N                      -                      1                                                        ⁢                                                            x                      ⁡                                              (                        n                        )                                                              ⁢                                          W                                              2                        ⁢                        N                                            nk                                                                      +                                                      ∑                                          n                      =                      N                                                                                      2                        ⁢                        N                                            -                      1                                                        ⁢                                                            y                      ⁡                                              (                        n                        )                                                              ⁢                                          W                                              2                        ⁢                        N                                            nk                                                                                                                                              =                                                                    ∑                                          n                      =                      0                                                              N                      -                      1                                                        ⁢                                                            x                      ⁡                                              (                        n                        )                                                              ⁢                                          W                                              2                        ⁢                        N                                            nk                                                                      +                                                      ∑                                          n                      =                      N                                                                                      2                        ⁢                        N                                            -                      1                                                        ⁢                                                            x                      ⁡                                              (                                                                              2                            ⁢                            N                                                    -                          n                          -                          1                                                )                                                              ⁢                                          W                                              2                        ⁢                        N                                            nk                                                                                                                                              =                                                                    ∑                                          n                      =                      0                                                              N                      -                      1                                                        ⁢                                                            x                      ⁡                                              (                        n                        )                                                              ⁢                                          W                                              2                        ⁢                        N                                            nk                                                                      +                                                      ∑                                          n                      =                      0                                                              N                      -                      1                                                        ⁢                                                            x                      ⁡                                              (                        n                        )                                                              ⁢                                          W                                              2                        ⁢                        N                                                                                              (                                                                                    2                              ⁢                              N                                                        -                            n                            -                            1                                                    )                                                k                                                                                                                                                                                    =                                                      ∑                                          n                      =                      0                                                              N                      -                      1                                                        ⁢                                                            x                      ⁡                                              (                        n                        )                                                              ⁡                                          [                                                                        W                                                      2                            ⁢                            N                                                    nk                                                +                                                  W                                                      2                            ⁢                            N                                                                                -                                                                                          (                                                                  n                                  +                                  1                                                                )                                                            k                                                                                                                          ]                                                                                  ,                              k                =                0                            ,              1              ,              …              ⁢                              xe2x80x83                            ,                                                2                  ⁢                  N                                -                1                                                                        (        2.3        )            
Multiplying both sides of Eq. (2.6) by a factor of                     2        N              ⁢          C      ⁡              (        k        )              ⁢          1      2        ⁢          W              2        ⁢        N                    k        /        2              ,
where C(k) is defined in Eq. (2.1) and (2.2), we directly obtain the N-point DCT results as                               X          ⁡                      (            k            )                          =                                                            2                N                                      ⁢                          C              ⁡                              (                k                )                                      ⁢                                          ∑                                  n                  =                  0                                                  N                  -                  1                                            ⁢                                                x                  ⁡                                      (                    n                    )                                                  ⁢                cos                ⁢                                                                            (                                                                        2                          ⁢                          n                                                +                        1                                            )                                        ⁢                    k                    ⁢                                          xe2x80x83                                        ⁢                    π                                                        2                    ⁢                    N                                                                                =                                                    2                N                                      ⁢                          C              ⁡                              (                k                )                                      ⁢                          1              2                        ⁢                          W                              2                ⁢                N                                            k                /                2                                      ⁢                          Y              ⁡                              (                k                )                                                                        (        2.7        )            
for k=0, 1, . . . , Nxe2x88x921. Thus, the N-point DCT X(k) can easily be calculated from 2N-point DFT Y(k) by multiplying by the scale factor             2      N        ⁢      C    ⁡          (      k      )        ⁢      1    2    ⁢            W              2        ⁢        N                    k        /        2              .  
When the sequence {x(n)} is real, {y(n)} is real and symmetric. In this case, {Y(k)} I can be obtained via two N-point FFTs rather than by a single 2N-point FFT [Sor87, RY90]. Since an N-point FFT requires N log2 N complex operations in general, the N-point DCT X(k) can be computed with 2N log2 N complex operations plus the scaling with             2      N        ⁢      C    ⁡          (      k      )        ⁢      1    2    ⁢      W          2      ⁢      N              k      /      2      
In the same spirit, the N-point DCT computation may also be calculated via other transforms such as Walsh-Hadamard Transform (WHT) [Ven88] for Nxe2x89xa616 and Discrete Hartley Transform (DHT) [Mal87]. The WHT is known to be fast since the computation involves no multiplications. Thus an algorithm for DCT via WHT may well utilize this advantage. The DHT, on the other hand, is very similar to DFT. The detailed implementation of these two transforms can be found in [RY90].
2.2.2 1-D DCT via Direct Factorizations
Consider the computation of the DCT of an input sequence {x(n)} I and let this sequence be represented by a (Nxc3x971) column vector x, then the transformed sequence (in vector form) of DCT computation can be expressed in vector notation as follows [RY90]:                     X        =                              [                                                                                X                    ⁡                                          (                      0                      )                                                                                                                    ⋮                                                                              ⋮                                                                                                  X                    ⁡                                          (                                              N                        -                        1                                            )                                                                                            ]                    =                                                                      2                  N                                            ⁢                              A                N                            ⁢              x                        =                                                            2                  N                                            ⁢                                                A                  N                                ⁡                                  [                                                                                                              x                          ⁡                                                      (                            0                            )                                                                                                                                                              ⋮                                                                                                            ⋮                                                                                                                                      x                          ⁡                                                      (                                                          N                              -                              1                                                        )                                                                                                                                ]                                                                                        (        2.8        )            
where AN is an Nxc3x97N coefficient matrix and each element of AN is defined as:                               a          ij                =                              C            ⁡                          (              i              )                                ⁢          cos          ⁢                      xe2x80x83                    ⁢                                                    (                                                      2                    ⁢                                          xe2x80x83                                        ⁢                    j                                    +                  1                                )                            ⁢                              xe2x80x83                            ⁢              i              ⁢                              xe2x80x83                            ⁢              π                                      2              ⁢                              xe2x80x83                            ⁢              N                                                          (        2.9        )            
When the matrix AN is factored into sparse matrices, the number of computations is reduced.
One way to achieve a fast 1-D DCT computation by sparse matrix factorizations is as follows: Assume N is a power of 2, AN can then be decomposed in the form                               A          N                =                                                            P                N                            ⁡                              [                                                                                                    A                                                  N                          /                          2                                                                                                            0                                                                                                  0                                                                                                                R                          _                                                                          N                          /                          2                                                                                                                    ]                                                    N              xc3x97              N                                ⁢                      B            N                                              (        2.10        )            
where AN/2 is the coefficient matrix for a N/2-point DCT; PN is a permutation matrix which permutes the even rows in increasing order in the top half and the odd rows in decreasing order in the bottom half; BN is a butterfly matrix which can be expressed in terms of the identity matrix IN/2 and the opposite identity matrix ĨN/2 (i.e. the elements position on the opposite diagonal are equal to 1, others are 0) as follows:                               B          N                =                              [                                                                                I                                          N                      /                      2                                                                                                                                  I                      ~                                                              N                      /                      2                                                                                                                                                              I                      ~                                                              N                      /                      2                                                                                                            -                                          I                                              N                        /                        2                                                                                                                  ]                                N            xc3x97            N                                              (        2.11        )            
RN/2 is the remaining (N/2xc3x97N/2) block in the factor matrix which can be obtained by reversing the orders of both the rows and columns of an intermediate matrix RN/2, where the definition of each element of RN/2 is:                                           r            ij                    =                      cos            ⁢                          xe2x80x83                        ⁢                                                            (                                                            2                      ⁢                                              xe2x80x83                                            ⁢                      i                                        +                    1                                    )                                ⁢                                  (                                                            2                      ⁢                                              xe2x80x83                                            ⁢                      j                                        +                    1                                    )                                ⁢                π                                            2                ⁢                                  xe2x80x83                                ⁢                N                                      ⁢                          xe2x80x83                        ⁢            i                          ,                  j          =          0                ,        1        ,        …        ⁢                  xe2x80x83                ,                              N            /            2                    -          1                                    (        2.12        )            
The factorization of Eq. (2.10) is only partly recursive because the matrix RN/2 can not be recursively factored. However, there is regularity in its factorization, where it can be decomposed into five types of inatrixfactors and all of them have no more that two non-zero elements in each row [Wan83]. And only   (            N      ⁢              xe2x80x83            ⁢              log        2            ⁢      N        -                  3        ⁢                  xe2x80x83                ⁢        N            2        +    4    )
real multiplications and                     3        ⁢                  xe2x80x83                ⁢        N            2        ⁢          (                                    log            2                    ⁢          N                -        1            )        +  2
real additions are required by this approach.
The key of this approach is that the AN is reduced in terms of AN/2. Take a 4-point sequence for example, the matrix A4 can be decomposed as:                               A          4                =                                            [                                                                    1                                                        0                                                        0                                                        0                                                                                        0                                                        0                                                        0                                                        1                                                                                        0                                                        1                                                        0                                                        0                                                                                        0                                                        0                                                        1                                                        0                                                              ]                        ⁡                          [                                                                                          1                      /                                              2                                                                                                                        1                      /                                              2                                                                                                  0                                                        0                                                                                                              cos                      ⁡                                              (                                                  π                          /                          4                                                )                                                                                                                        cos                      ⁡                                              (                                                  π                          /                          4                                                )                                                                                                  0                                                        0                                                                                        0                                                        0                                                                              -                                              cos                        ⁡                                                  (                                                      π                            /                            8                                                    )                                                                                                                                                cos                      ⁡                                              (                                                  3                          ⁢                                                      xe2x80x83                                                    ⁢                                                      π                            /                            8                                                                          )                                                                                                                                  0                                                        0                                                                              cos                      ⁡                                              (                                                  3                          ⁢                                                      π                            /                            8                                                                          )                                                                                                                        cos                      ⁡                                              (                                                  π                          /                          8                                                )                                                                                                        ]                                ⁡                      [                                                            1                                                  0                                                  0                                                  1                                                                              0                                                  1                                                  1                                                  0                                                                              0                                                  1                                                                      -                    1                                                                    0                                                                              1                                                  0                                                  0                                                                      -                    1                                                                        ]                                              (        2.13        )            
where                                           P            4                    =                      [                                                            1                                                  0                                                  0                                                  0                                                                              0                                                  0                                                  0                                                  1                                                                              0                                                  1                                                  0                                                  0                                                                              0                                                  0                                                  1                                                  0                                                      ]                          ,                              B            4                    =                      [                                                            1                                                  0                                                  0                                                  1                                                                              0                                                  1                                                  1                                                  0                                                                              0                                                  1                                                                      -                    1                                                                    0                                                                              1                                                  0                                                  0                                                                      -                    1                                                                        ]                                              (        2.14        )                                                      A            2                    =                      [                                                                                1                    /                                          2                                                                                                            1                    /                                          2                                                                                                                                        cos                    ⁡                                          (                                              π                        /                        4                                            )                                                                                                            cos                    ⁡                                          (                                              3                        ⁢                                                  xe2x80x83                                                ⁢                                                  π                          /                          4                                                                    )                                                                                            ]                          ,                                            R              _                        2                    =                      [                                                                                -                                          cos                      ⁡                                              (                                                  π                          /                          8                                                )                                                                                                                                  cos                    ⁡                                          (                                              3                        ⁢                                                  xe2x80x83                                                ⁢                                                  π                          /                          8                                                                    )                                                                                                                                        cos                    ⁡                                          (                                              π                        /                        4                                            )                                                                                                            cos                    ⁡                                          (                                              3                        ⁢                                                  π                          /                          4                                                                    )                                                                                            ]                                              xe2x80x83            
Alternatively, some factorization schemes have adopted decimation-in-time (DIT) or decimation-in-frequency (DIF) approach, which achieve fast computation through rearranging the input sequence {x(n)} or output sequence {X(k)}, respectively.
Looking at the DIT approach for example. If the scale factors in Eq. (2.1) are left out for convenience, the transformed sequence X(k) can be expressed as:                                           X            ⁡                          (              k              )                                =                                    ∑                              n                =                0                                            N                -                1                                      ⁢                                          x                ⁡                                  (                  n                  )                                            ⁢              cos              ⁢                              xe2x80x83                            ⁢                                                                    (                                                                  2                        ⁢                        n                                            +                      1                                        )                                    ⁢                  k                  ⁢                                      xe2x80x83                                    ⁢                  π                                                  2                  ⁢                                      xe2x80x83                                    ⁢                  N                                                                    ,                  xe2x80x83                ⁢                  k          =          0                ,        1        ,        …        ⁢                  xe2x80x83                ,                  N          -          1                                    (        2.15        )            
There are two steps in the DIT approach, and their objective is to reduce an N-point DCT to an N/2-point DCT by permutation of the input sample points in the time domain. The first step in the DIT algorithm consists of a rearrangement of the input sample points. The second step reduces the N-pomit transform to two N/2-point transforms to establish the recursive aspect of the algorithm [RY90].
Deefining                                                                         G                ⁡                                  (                  k                  )                                            =                              xe2x80x83                            ⁢                                                ∑                                      n                    =                    0                                                        N                    /                    2                                                  ⁢                                                      [                                                                  x                        ⁡                                                  (                                                      2                            ⁢                            n                                                    )                                                                    +                                              x                        ⁡                                                  (                                                                                    2                              ⁢                              n                                                        -                            1                                                    )                                                                                      ]                                    ⁢                  cos                  ⁢                                                            k                      ⁢                                              xe2x80x83                                            ⁢                      n                      ⁢                                              xe2x80x83                                            ⁢                      π                                                              N                      /                      2                                                        ⁢                                      xe2x80x83                                    ⁢                  and                                                                                                                                          H                  ⁡                                      (                    k                    )                                                  =                                  xe2x80x83                                ⁢                                                      ∑                                          n                      =                      0                                                                                      N                        /                        2                                            -                      1                                                        ⁢                                                            [                                                                        x                          ⁡                                                      (                                                          2                              ⁢                              n                                                        )                                                                          +                                                  x                          ⁡                                                      (                                                                                          2                                ⁢                                                                  xe2x80x83                                                                ⁢                                n                                                            +                              1                                                        )                                                                                              ]                                        ⁢                    cos                    ⁢                                          xe2x80x83                                        ⁢                                                                                            (                                                                                    2                              ⁢                                                              xe2x80x83                                                            ⁢                              n                                                        +                            1                                                    )                                                ⁢                        k                        ⁢                                                  xe2x80x83                                                ⁢                        π                                            N                                                                                  ,                                                                                          xe2x80x83                            ⁢                              "AutoLeftMatch"                                                      k                    =                    0                                    ,                  1                  ,                  …                  ⁢                                      xe2x80x83                                    ,                                                            N                      /                      2                                        -                    1                                                                                                          (        2.16        )            
with x(xe2x88x921)=x(N)=0 as the initial conditions for x(n).
Using the properties of the cosine functions, it is easy to see that Eq. (2.16) can be substituted into Eq. (2.15), resulting in:                                                                         X                ⁡                                  (                  k                  )                                            =                              xe2x80x83                            ⁢                                                1                  2                                ⁢                                                      [                                                                  G                        ⁡                                                  (                          k                          )                                                                    +                                              H                        ⁡                                                  (                          k                          )                                                                                      ]                                                        cos                    ⁡                                          (                                              k                        ⁢                                                  xe2x80x83                                                ⁢                                                  π                          /                          N                                                                    )                                                                      ⁢                                  xe2x80x83                                ⁢                and                                                                                                                          X                  ⁡                                      (                                          N                      -                      k                      -                      1                                        )                                                  =                                  xe2x80x83                                ⁢                                                      1                    2                                    [                                                                                    G                        ⁡                                                  (                                                      k                            +                            1                                                    )                                                                    +                                              H                        ⁡                                                  (                                                      k                            +                            1                                                    )                                                                                                            cos                      ⁡                                              (                                                                              (                                                          N                              -                              k                              -                              1                                                        )                                                    ⁢                                                      π                            /                            2                                                    ⁢                                                      xe2x80x83                                                    ⁢                          N                                                )                                                                                                        ,                                                                                          xe2x80x83                            ⁢                                                k                  =                  0                                ,                1                ,                …                ⁢                                  xe2x80x83                                ,                                                      N                    /                    2                                    -                  1                                                                                        (        2.17        )            
In Eq. (2.16), the sequence {H(k)} is obtained as a DCT with N/2 sample points and {G(k)} is obtained as a DCT with (N/2+1) sample points. Each of these smaller transform can be further reduced, which leads to the desired recursive structure. Excluding scaling and normalization, it is found that for an N-point (N being power of 2) sequence {x(n)}, the DIT algorithm for DCT requires ((N/2)log2 N+N/4) real multiplications and ((3N/2xe2x88x921)log2 N+N/4+1) real additions [RY90].
When the rearrangement of the sample points results in the transformed sequence being grouped into even- and odd-frequency indexed portions, the decomposition is said to constitute a DIF algorithm. For an N-point (radix-2) sequence {x(n)}, the DIF algorithm for DCT requires (N/2) log2 N real multiplications and ((3N/2)log2 Nxe2x88x92N+1) real additions [RY90].
2.2.3 1-D DCT Based on Recursive Algorithms
In addition to the algorithms described in the previous sections, there exist many more other different kind approaches. Using some well-known recursive algorithms to compute DCT/IDCT is one of them which can achieve the goal of fast computation. Two typical ones are shown here: Chebyshev Polynomial recurrence and Clenshaw""s recurrence formula.
One fast recursive algorithm for computing the DCT based on the Chebyshev Polynomial factorization is proposed by Wang and Chen [WC95]. Recall the following trigonometric identity:
cos(xcex3xcex1)=2 cos xcex1 cos[(xcex3xe2x88x921)xcex1]xe2x88x92cos[(xcex3xe2x88x922)xcex1]xe2x80x83xe2x80x83(2.18)
which, by the way, can be proved using the Chebyshev polynomial. If one leaves out the scale factors in Eq. (2.1) for convenience (i.e. use Eq. (2.15) as the definition of the 1-D DCT) and define the recursive variables as                                                         P              ⁡                              (                                  k                  ,                  n                                )                                      =                                          cos                ⁢                                  xe2x80x83                                ⁢                                                                            (                                                                        2                          ⁢                          n                                                +                        1                                            )                                        ⁢                    k                    ⁢                                          xe2x80x83                                        ⁢                    π                                                        2                    ⁢                                          xe2x80x83                                        ⁢                    N                                                              =                              cos                ⁢                                                                            (                                                                        2                          ⁢                          n                                                +                        1                                            )                                        ⁢                    α                                    2                                                              ,                      α            =                          k              ⁢                              xe2x80x83                            ⁢                              π                /                N                            ⁢                              xe2x80x83                            ⁢              and                                      ⁢                  
                ⁢                              A            ⁡                          (                              k                ,                n                            )                                =                                                    ∑                                  n                  =                  0                                m                            ⁢                                                x                  ⁡                                      (                    n                    )                                                  ⁢                cos                ⁢                                  xe2x80x83                                ⁢                                                                            (                                                                        2                          ⁢                          n                                                +                        1                                            )                                        ⁢                    k                    ⁢                                          xe2x80x83                                        ⁢                    π                                                        2                    ⁢                                          xe2x80x83                                        ⁢                    N                                                                        =                                          ∑                                  n                  =                  0                                m                            ⁢                                                x                  ⁡                                      (                    n                    )                                                  ⁢                                  P                  ⁡                                      (                                          k                      ,                      n                                        )                                                                                                          (        2.19        )            
the 1-D DCT can be computed using the following Chebyshev polynomial recurrence:
P(k,xe2x88x921)=P(k,0)=cos(xcex1/2) and
P(k,n+1)=2 cos(xcex1)P(k,n)xe2x88x92P(k,nxe2x88x921)xe2x80x83xe2x80x83(2.20)
A(k,xe2x88x921)=0 and
A(k,n)=A(k,nxe2x88x921)+x(n)P(k,n)xe2x80x83xe2x80x83(2.21)
where X(k)=A(k,Nxe2x88x921), k=0, 1, . . . , Nxe2x88x921. Thus the X(k) can be calculated in N recursive steps from the input sequence x(n) using Eq. (2.20) and (2.21). For an N-point sequence {x(n)}, this recursive algorithm requires 2N(Nxe2x88x921) real multiplications and real additions.
In addition, Aburdene et al. proposed another fast recursive algorithm for computing the DCT based on the Clenshaw""s (or Goertzel""s as called in other papers) recurrence formula [AZK95]. The Clenshaw""s recurrence formula states that considering a linear combination of the form                               f          ⁡                      (            x            )                          =                              ∑                          n              =              0                        N                    ⁢                                    c              ⁡                              (                n                )                                      ⁢                          F              ⁡                              (                                  x                  ,                  n                                )                                                                        (        2.22        )            
in which F(x,n) obeys a recurrence relation
F(x,n+1)=xcex1(x,n)F(x,n)+xcex2(x,n)F(x,nxe2x88x921)xe2x80x83xe2x80x83(2.23)
for some functions xcex1(x,n) and xcex2(x,n), then the sum f(x) can be computed as
f(x)=c(N)F(x,N)xe2x88x92xcex2(x,n)F(x,Nxe2x88x921)xe2x88x92F(x,N)"psgr"(Nxe2x88x922)xe2x80x83xe2x80x83(2.24)
where {"psgr"(n)} can be obtained from the following recurrence relations:
"psgr"(xe2x88x922)="psgr"(xe2x88x921)=0 and
"psgr"(n)=["psgr"(nxe2x88x922)xe2x88x92xcex1(x,n)"psgr"(nxe2x88x921)xe2x88x92c(n)]/xcex2(x,n+1), n=0, 1, . . . , Nxe2x88x921xe2x80x83xe2x80x83(2.25)
Defining
xcexk=kxcfx80/N and
F(xcexk,n)=cos[(n+xc2xd)(kxcfx80/N)]=cos[(n+1/2)xcexk]xe2x80x83xe2x80x83(2.26)
the 1-D DCT can be expressed as                                           f            ⁡                          (                              λ                k                            )                                =                                    X              ⁡                              (                k                )                                      =                                          ∑                                  n                  =                  0                                                  N                  -                  1                                            ⁢                                                x                  ⁡                                      (                    n                    )                                                  ⁢                                  F                  ⁡                                      (                                                                  λ                        k                                            ,                      n                                        )                                                                                      ,                  xe2x80x83                ⁢                  k          =          0                ,                  1          ⁢                      xe2x80x83                    ⁢          …                ⁢                  xe2x80x83                ,                  N          -          1                                    (        2.27        )            
The calculation of F(xcexk,n) can be made recursively using the identity
cos[(n+3/2)xcexk]=2 cos(xcexk)cos[(n+1/2)xcexk]xe2x88x92cos[(nxe2x88x921/2)xcexk]xe2x80x83xe2x80x83(2.28)
to generate the recurrence expression for F(xcexk, n+1) as
F(xcexk,n+1)=2 cos(xcexk)F(xcexk, n)xe2x88x92F(xcexk,nxe2x88x921)xe2x80x83xe2x80x83(2.29)
Comparing Eq. (2.23) and (2.29), one can see that the terms xcex1(x,n) and xcex2(x,n) in Eq. (2.23) should be chosen as 2 cos (xcexk) and xe2x88x921.
Substitute Eq. (2.24) in Eq. (2.27), we can find   "AutoLeftMatch"                                                                                          X                  ⁡                                      (                    k                    )                                                  =                                                                            x                      ⁡                                              (                                                  N                          -                          1                                                )                                                              ⁢                                          F                      ⁡                                              (                                                                              λ                            k                                                    ,                                                      N                            -                            1                                                                          )                                                                              -                                                            F                      ⁡                                              (                                                                              λ                            k                                                    ,                                                      N                            -                            2                                                                          )                                                              ⁢                                          ψ                      ⁡                                              (                                                  N                          -                          2                                                )                                                                              -                                                            F                      ⁡                                              (                                                                              λ                            k                                                    ,                                                      N                            -                            1                                                                          )                                                              ⁢                                          ψ                      ⁡                                              (                                                  N                          -                          3                                                )                                                                                                                                                                    =                                                                            (                                              -                        1                                            )                                        k                                    ⁡                                      [                                                                                            x                          ⁡                                                      (                                                          N                              -                              1                                                        )                                                                          ⁢                                                  cos                          ⁡                                                      (                                                                                          λ                                k                                                            /                              2                                                        )                                                                                              +                                                                        cos                          ⁡                                                      (                                                          3                              ⁢                                                              xe2x80x83                                                            ⁢                                                                                                λ                                  k                                                                /                                2                                                                                      )                                                                          ⁢                                                  ψ                          ⁡                                                      (                                                          N                              -                              2                                                        )                                                                                              -                                                                        cos                          ⁡                                                      (                                                                                          λ                                k                                                            /                              2                                                        )                                                                          ⁢                                                  ψ                          ⁡                                                      (                                                          N                              -                              3                                                        )                                                                                                                ]                                                                                                                          =                                                                            (                                              -                        1                                            )                                        k                                    ⁢                                                            cos                      ⁡                                              (                                                                              λ                            k                                                    /                          2                                                )                                                              ⁡                                          [                                                                        ψ                          ⁡                                                      (                                                          N                              -                              1                                                        )                                                                          -                                                  ψ                          ⁡                                                      (                                                          N                              -                              2                                                        )                                                                                              ]                                                                                                                                (          2.30          )                    
where "psgr"(n) is obtained from Eq. (2.25) as
"psgr"(xe2x88x922)="psgr"(xe2x88x921)=0 and
"psgr"(n)=2 cos(xcexk)"psgr"(nxe2x88x921)xe2x88x92"psgr"(nxe2x88x922)+x(n), n=0, 1, . . . , Nxe2x88x921xe2x80x83xe2x80x83(2.31)
Thus, "psgr"(n) can be recursively generated from the input sequence x(n) according Eq. (2.31). And at the Nth step, X(k) can be evaluated by Eq. (2.30) for k=0, 1, . . . , Nxe2x88x921. For an N-point sequence {x(n)}, this recursive algorithm requires about N2 real multiplications and real additions.
2.2.4 1-D DCT/IDCT Hardware Implementations
The algorithms that compute the DCT/IDCT indirectly via other discrete transforms are normally not the good candidate for hardware implementation. The conversion between the input and output data of two different transforms is generally complicated. Many transforms, like FFT and WHT, use complex architectures, which make the hardware implementations of the 1-D DCT even less efficient. The advantage of computing the 1-D DCT via DFT is that the standard FFT routines and implementations are available that can be directly used in the DCT/IDCT.
The algorithms that compute the DCT/IDCT via direct factorizations have the advantages that they are reasonably fast and recursive in some degree. These algorithms make full use of the sparseness of the DCT/IDCT coefficient matrix and require much fewer multiplications and additions/subtractions. But the complicated index mapping of global interconnection from the input and to the output data makes the hardware implementations rather difficult.
Alternatively, although the DCT/IDCT algorithms based on recursive approaches do not necessarily use fewer operations than other discrete transforms, the recursive nature makes them easy to be implemented with relatively simple processing elements (PE) and simple interconnections among the PEs. Identical or similar structured PEs in a hardware implementation can greatly reduce the cost of the design and layout process. It has been shown that time recursive algorithms and the resulting DCT/IDCT architectures are well suited for VLSI implementation.
One of the recursive schemes that can be easily adopted for the 1-D DCT hardware implementation is the Chebyshev polynomial method (described in section 2.2.3). The basic function cell to compute the 1-D DCT based on this method is shown in FIG. 6 [WC95]. For N-point input sequence, total N cells are required for k=0, 1, . . . , Nxe2x88x921. Since these N cells have identical structure, functional design and layout cost can be reduced correspondingly.
Another example of the 1-D DCT hardware implementation using recursive scheme is based on Clenshaw""s recurrence formula (described in section 2.2.4). The hardware structure of the implementation is shown in FIG. 7 [AZK95].
Similar to the definitions of the 1-D DCT/IDCT, the forward and inverse 2-D Discrete Cosine Transform (2-D DCT/IDCT) of an input sequence x(m,n), 0xe2x89xa6m,n less than N, are defined as:                               X          ⁡                      (                          k              ,              l                        )                          =                              2            N                    ⁢                      C            ⁡                          (              k              )                                ⁢                      C            ⁡                          (              l              )                                ⁢                                    ∑                              m                =                0                                            N                -                1                                      ⁢                                          ∑                                  n                  =                  0                                                  N                  -                  1                                            ⁢                                                x                  ⁡                                      (                                          m                      ,                      n                                        )                                                  ⁢                cos                ⁢                                                                            (                                                                        2                          ⁢                          m                                                +                        1                                            )                                        ⁢                    k                    ⁢                                          xe2x80x83                                        ⁢                    π                                                        2                    ⁢                                          xe2x80x83                                        ⁢                    N                                                  ⁢                cos                ⁢                                  xe2x80x83                                ⁢                                                                            (                                                                        2                          ⁢                          n                                                +                        1                                            )                                        ⁢                    l                    ⁢                                          xe2x80x83                                        ⁢                    π                                                        2                    ⁢                    N                                                                                                          (        2.32        )            
where                                           x            ⁡                          (                              m                ,                n                            )                                =                                    2              N                        ⁢                                          ∑                                  k                  =                  0                                                  N                  -                  1                                            ⁢                                                ∑                                      l                    =                    0                                                        N                    -                    1                                                  ⁢                                                      C                    ⁡                                          (                      k                      )                                                        ⁢                                      C                    ⁡                                          (                      l                      )                                                        ⁢                                      X                    ⁡                                          (                                              k                        ,                        l                                            )                                                        ⁢                  cos                  ⁢                                                                                    (                                                                              2                            ⁢                            m                                                    +                          1                                                )                                            ⁢                      k                      ⁢                                              xe2x80x83                                            ⁢                      π                                                              2                      ⁢                                              xe2x80x83                                            ⁢                      N                                                        ⁢                  cos                  ⁢                                      xe2x80x83                                    ⁢                                                                                    (                                                                              2                            ⁢                            n                                                    +                          1                                                )                                            ⁢                      l                      ⁢                                              xe2x80x83                                            ⁢                      π                                                              2                      ⁢                      N                                                                                                          ⁢                  
                ⁢                              where            ⁢                          xe2x80x83                        ⁢                          C              ⁡                              (                k                )                                              ,                                    C              ⁡                              (                l                )                                      =                          {                                                                                                                  1                        /                                                  2                                                                    ,                                                                                                  k                      ,                                              l                        =                        0                                                                                                                                                        1                      ,                                                                            otherwise                                                                                                          (        2.33        )            
For an Nxc3x97N point input sequence, both the 2-D DCT and 2-D IDCT require O(N4) real multiplications and corresponding additions/subtractions, assuming the computations are carried out by brute force. In order to improve the efficiency of 2-D DCT and 2-D IDCT computations, various fast computational algorithms and corresponding architectures have been proposed. In general, all of these algorithms can be broadly classified into 3 basic categories: 1) compute the 2-D DCT/IDCT indirectly via other discrete fast transforms, 2) decompose the 2-D DCT/IDCT into two 1-D DCT/IDCTs, and 3) compute the 2-D DCT/IDCT based on direct matrix factorization or decomposition.
Computation of the 2-D DCT/IDCT via other discrete fast transforms manages to take advantage of the existence of other kinds 2-D discrete transform algorithms and architectures. The best candidates that can be employed to perform the 2-D DCT/IDCT, for example, are the 2-D FFT and 2-D WHT [NK83, Vet85].
However, the decomposition of a 2-D DCT/IDCT into two 1-D DCT/IDCTs, which conventionally is also called the Row-Column Method (RCM), evaluates the 1-D DCT/IDCT in row-column-wise or column-row-wise form. That is, it starts by processing the row (or column) elements of input data block as a 1-D DCT/IDCT and store the results in an intermediate memory; it then processes the transposed column (or row) elements of the intermediate results to further yield the 2-D DCT/IDCT results [CW95, SL96, MW95, Jan94]. Since the RCM reduces the 2-D DCT into two separate 1-D DCTs, existing 1-D algorithms listed in section 2.2 can be directly used so that the computational complexity can be simplified.
The direct 2-D factorization methods work directly on the 2-D data set and coefficient matrices. This kind of approach mainly concentrates on reducing the redundancy within the 2-D DCT/IDCT computations so that much fewer multiplications would be required [DG90, CL91, Lee97].
2.3.1 2-D DCT via Other Discrete Transforms
The close relationship between the DCT and the DFT can also be exploited in the two-dimensional case.
As shown by Nasrabadi and King [NK83], a rearrangement of the input matrix elements easily leads to expressions involving evaluation of two-dimensional DFTs. Leaving the scale factors out of Eq. (2.32) and Eq. (2.33), and treating x(m,n) and X(k,l) as scaled and normalized two-dimensional input and output data as                               X          ⁡                      (                          k              ,              l                        )                          =                              ∑                          m              =              0                                      N              -              1                                ⁢                                    ∑                              n                =                0                                            N                -                1                                      ⁢                                          x                ⁡                                  (                                      m                    ,                    n                                    )                                            ⁢              cos              ⁢                                                                    (                                                                  2                        ⁢                        m                                            +                      1                                        )                                    ⁢                  k                  ⁢                                      xe2x80x83                                    ⁢                  π                                                  2                  ⁢                                      xe2x80x83                                    ⁢                  N                                            ⁢              cos              ⁢                              xe2x80x83                            ⁢                                                                    (                                                                  2                        ⁢                        n                                            +                      1                                        )                                    ⁢                  l                  ⁢                                      xe2x80x83                                    ⁢                  π                                                  2                  ⁢                  N                                                                                        (        2.34        )                                          x          ⁡                      (                          m              ,              n                        )                          =                              ∑                          k              =              0                                      N              -              1                                ⁢                                    ∑                              l                =                0                                            N                -                1                                      ⁢                                          X                ⁡                                  (                                      k                    ,                    l                                    )                                            ⁢              cos              ⁢                                                                    (                                                                  2                        ⁢                        m                                            +                      1                                        )                                    ⁢                  k                  ⁢                                      xe2x80x83                                    ⁢                  π                                                  2                  ⁢                                      xe2x80x83                                    ⁢                  N                                            ⁢              cos              ⁢                              xe2x80x83                            ⁢                                                                    (                                                                  2                        ⁢                        n                                            +                      1                                        )                                    ⁢                  l                  ⁢                                      xe2x80x83                                    ⁢                  π                                                  2                  ⁢                  N                                                                                        (        2.35        )            
Define an intermediate Nxc3x97N transform sequence
y(m,n)=x(2m,2n), m,n=0, 1, . . . , N/2xe2x88x921
y(m,n)=x(2Nxe2x88x922mxe2x88x921,2n), m=N/2, . . . , Nxe2x88x921, n=0, 1, . . , N/2xe2x88x921
y(m,n)=x(2m,2Nxe2x88x922nxe2x88x921), m=0, 1, . . . , N/2xe2x88x921, n=N/2, . . . , Nxe2x88x921 and
y(m,n)=x(2Nxe2x88x922mxe2x88x921,2Nxe2x88x922nxe2x88x921), m,n=N/2, . . . , Nxe2x88x921xe2x80x83xe2x80x83(2.36)
where the 2-D DFT of y(m,n) can be calculated as:                               Y          ⁡                      (                          k              ,              l                        )                          =                              ∑                          m              =              0                                      N              -              1                                ⁢                                    ∑                              n                =                0                                            N                -                1                                      ⁢                                          y                ⁡                                  (                                      m                    ,                    n                                    )                                            ⁢                              W                N                mk                            ⁢                              W                N                nj                                                                        (        2.37        )            
and the WNk denotes exp(xe2x88x92j2kxcfx80/N).
Furthermore, using a simple compound angle formula for the cosine functions, it is possible to derive the following similarly to Eq. (2.7) as
X(k,l)=(1/2)Re{W4Nk[W4NjY(k,l)+W4Nxe2x88x92jY(k,Nxe2x88x921)]}xe2x80x83xe2x80x83(2.38)
Above Eq. (2.38) is sometimes referred to as representing xe2x80x9cphasor-modifiedxe2x80x9d DFT components. And it can be further simplified as
X(k,l)=(xc2xd)Re{A(k,l)+jA(k,Nxe2x88x921)}xe2x80x83xe2x80x83(2.39)
where
A(k,l)=W4NkW4NjY(k,l) and
A(k,Nxe2x88x921)=W4NkW4NjY(k,Nxe2x88x921)xe2x80x83xe2x80x83(2.40)
Since Y(k,l) is the 2-D DFT, its implementation can be realized by using any of the available 2-D algorithms. One of the most efficient methods proposed by Nussbaumer is to compute the 2-D real DFT by means of the polynomial transforms [Nus81]. The reduction in computational complexity is obtained by mapping the DFT on the index m to polynomial transform. Overall, an Nxc3x97N point DCT is mapped onto N DFTs of lengths N. For real Nxc3x97N input sequence {x(m,n)}, the 2-D DCT requires ((N2/2xe2x88x921)log2 N+N2/3xe2x88x922Nxe2x88x928/3) complex multiplications and ((5N2/2)log2 N+N2/3xe2x88x926Nxe2x88x9262/3) complex additions.
Besides, the 2-D DCT can also be carried out via the 2-D Walsh-Hadamard Transform (WHT) [Vet85].
2.3.2 2-D DCT by Row-Column Method (RCM)
Like some other discrete transforms, such as DFT, WHT, ST, HT, etc., the 2-D DCT is a separable transform. And Eq. (2.32) can also be expressed as                                           X            ⁡                          (                              k                ,                l                            )                                =                                                    2                N                                      ⁢                          C              ⁡                              (                k                )                                      ⁢                                          ∑                                  m                  =                  0                                                  N                  -                  1                                            ⁢                                                {                                                                                    2                        N                                                              ⁢                                          C                      ⁡                                              (                        l                        )                                                              ⁢                                                                  ∑                                                  n                          =                          0                                                                          N                          -                          1                                                                    ⁢                                                                        x                          ⁡                                                      (                                                          m                              ,                              n                                                        )                                                                          ⁢                        cos                        ⁢                                                  xe2x80x83                                                ⁢                                                                                                            (                                                                                                2                                  ⁢                                  n                                                                +                                1                                                            )                                                        ⁢                            l                            ⁢                                                          xe2x80x83                                                        ⁢                            π                                                                                2                            ⁢                                                          xe2x80x83                                                        ⁢                            N                                                                                                                                }                                ⁢                cos                ⁢                                  xe2x80x83                                ⁢                                                                            (                                                                        2                          ⁢                          m                                                +                        1                                            )                                        ⁢                    k                    ⁢                                          xe2x80x83                                        ⁢                    π                                                        2                    ⁢                    N                                                                                      ⁢                  
                ⁢                              where            ⁢                          xe2x80x83                        ⁢            k                    ,                      1            =            0                    ,          1          ,          …          ⁢                      xe2x80x83                    ,                      N            -            1.                                              (        2.41        )            
The inner summation             2      N        ⁢      C    ⁢          (      l      )        ⁢            ∑              n        =        0                    N        -        1              ⁢                  x        ⁢                  (                      m            ,            n                    )                    ⁢      cos      ⁢                                    (                                          2                ⁢                n                            +              1                        )                    ⁢          l          ⁢                      xe2x80x83                    ⁢          π                          2          ⁢                      xe2x80x83                    ⁢          N                    
is an N-point 1-D DCT of the rows of x(m,n), whereas the outer summation represents the N-point 1-D DCT of the columns of the xe2x80x9csemi-transformedxe2x80x9d matrix, whose
elements are                     2        N              ⁢          C      ⁢              (        l        )              ⁢                  ∑                  n          =          0                          N          -          1                    ⁢              xe2x80x83            ⁢                        x          ⁢                      (                          m              ,              n                        )                          ⁢        cos        ⁢                                            (                                                2                  ⁢                  n                                +                1                            )                        ⁢            l            ⁢                          xe2x80x83                        ⁢            π                                2            ⁢            N                                ,
where m,l=0, 1, . . . , Nxe2x88x921.
This implies that a 2-D Nxc3x97N DCT can be implemented by N""s N-point DCTs along the columns of x(m,n), followed by N""s N-point DCTs along the rows of the results after the column transformations. In practice, the order in which the row transform and the column transform are done is theoretically immaterial.
All 1-D DCT fast algorithms discussed in section 2.2 can be used here to simplify the 2-D DCT computation, which requires totally 2N""s 1-D DCTs. For example, if the 1-D DCT is carried out via the 1-D FFT, approximate 2Nxc3x97(2N log2 N) complex operations plus the scaling are required.
2.3.3 2-D DCT Based on Direct Matrix Factorization/Decomposition
In the RCM, the computation reduction applies only to one 1-D array at a time. That makes these algorithms less efficient and not quite modular in structure. Haque reported a 2-D fast DCT algorithm based on a rearrangement of the elements of the two-dimensional input matrix into a block matrix form [Haq851. Each block of the matrix is then calculated via a xe2x80x9chalf-sizexe2x80x9d 2-D DCT.
The Nxc3x97N DCT block decomposition of Eq. (2.34) is based upon the following procedures:
(1) Decompose the Nxc3x97N input data x(m,n) into four (N/2)xc3x97(N/2) sub-blocks:
p(m,n)=x(m,n), q(m,n)=x(m,Nxe2x88x92nxe2x88x921),
r(m,n)=x(Nxe2x88x92mxe2x88x921,n), s(m,n)=x(Nxe2x88x92mxe2x88x921,Nxe2x88x92nxe2x88x921)xe2x80x83xe2x80x83(2.42)
(2) Arrange the sub-blocks as a block column vector and multiply by a 4xc3x974 block Walsh-Hadamard matrix H to get four (N/2)xc3x97(N/2) sub-blocks:                               [                                                                      [                                      g1                    xe2x80x2                                    ]                                                                                                      [                                      g2                    xe2x80x2                                    ]                                                                                                      [                                      g3                    xe2x80x2                                    ]                                                                                                      [                                      g4                    xe2x80x2                                    ]                                                              ]                =                              [                                                            I                                                  I                                                  I                                                  I                                                                              I                                                                      -                    I                                                                    I                                                                      -                    I                                                                                                I                                                  I                                                                      -                    I                                                                                        -                    I                                                                                                I                                                                      -                    I                                                                                        -                    I                                                                    I                                                      ]                    ⁢                      xe2x80x83                    [                                                                      [                                      p                    ⁡                                          (                                              m                        ,                        n                                            )                                                        ]                                                                                                      [                                      q                    ⁡                                          (                                              m                        ,                        n                                            )                                                        ]                                                                                                      [                                      r                    ⁡                                          (                                              m                        ,                        n                                            )                                                        ]                                                                                                      [                                      s                    ⁡                                          (                                              m                        ,                        n                                            )                                                        ]                                                              ]                                    (        2.43        )            
(3) Scale these sub-blocks with a diagonal scale matrix W:
[g1]=[g1xe2x80x2], [g2]=[g2xe2x80x2]W,
[g3]=[g3xe2x80x2], [g4]=[g4xe2x80x2]Wxe2x80x83xe2x80x83(2.44)
xe2x80x83where W=(1/2) diag [1/cos(xcfx80/2N), 1/cos(3xcfx80/2N), . . . , 1/cos((Nxe2x88x921)xcfx80/2N)].
(4) Compute the four (N/2)xc3x97(N/2) 2-D DCTs of the scaled sub-blocks [g1], [g2], [g3] and [g4] to obtain the results [G1(k,l)], [G2(k,l)], [G3(k,l)] and [G4(k,l)].
(5) Denote the even-even, even-odd, odd-even and odd-odd elements of the transformed matrix [X(k,l)] in Eq. (2.35) by four (N/2)xc3x97(N/2) sub-blocks:
P(k,l)=X(2k,2l), Q(k,l)=X(2k,2l+1),
R(k,l)=X(2k+1,2l), S(k,l)=X(2k+1,2l+1)xe2x80x83xe2x80x83(2.45)
(6) Convert [G1(k,l)], [G2(k,l)], [G3(k,l)] and [G4(k,l)] to [P(k,l)], [Q(k,l)], [R(k,l)] and [S(k,l)] by:
[P(k,l)]=[G1(k,l)], [Q(k,l)]=L[G2(k,l)],
[Q(k,l)]=[G2(k,l)]LT[Q(k,l)]=L[G2(k,l)]LTxe2x80x83xe2x80x83(2.46)
xe2x80x83where L is an (N/2)xc3x97(N/2) lower triangular matrix of the form                     L        =                  [                                                    1                                            0                                            0                                            …                                            …                                            0                                                                                      -                  1                                                            1                                            0                                            …                                            …                                            0                                                                    …                                            …                                            …                                            …                                            …                                            0                                                                                      ±                  1                                                            …                                            …                                            …                                                              -                  1                                                            1                                              ]                                    (        2.47        )            
The computation of the 2-D DCT based on the Haque""s algorithm requires ((3/4)N2 log2 N) multiplications and (3N2 log2 Nxe2x88x922N2+2N) additions [Haq85, RY90].
As an alternation, Cho and Lee proposed another approach for decomposing a 2-D DCT [CL91]. Using the following trigonometric relation                               cos          ⁢                                                    (                                                      2                    ⁢                    m                                    +                  1                                )                            ⁢              k              ⁢                              xe2x80x83                            ⁢              π                                      2              ⁢              N                                ⁢          cos          ⁢                      xe2x80x83                    ⁢                                                    (                                                      2                    ⁢                    n                                    +                  1                                )                            ⁢              l              ⁢                              xe2x80x83                            ⁢              π                                      2              ⁢              N                                      =                              1            2                    ⁡                      [                                          cos                ⁢                                  xe2x80x83                                ⁢                                                                                                    (                                                                              2                            ⁢                            m                                                    +                          1                                                )                                            ⁢                      k                      ⁢                                              xe2x80x83                                            ⁢                      π                                        +                                                                  (                                                                              2                            ⁢                            n                                                    +                          1                                                )                                            ⁢                      l                      ⁢                                              xe2x80x83                                            ⁢                      π                                                                            2                    ⁢                    N                                                              +                              cos                ⁢                                  xe2x80x83                                ⁢                                                                                                    (                                                                              2                            ⁢                            m                                                    +                          1                                                )                                            ⁢                      k                      ⁢                                              xe2x80x83                                            ⁢                      π                                        -                                                                  (                                                                              2                            ⁢                            n                                                    +                          1                                                )                                            ⁢                      l                      ⁢                                              xe2x80x83                                            ⁢                      π                                                                            2                    ⁢                    N                                                                        ]                                              (        2.48        )            
the 2-D DCT in Eq. (2.34) can be rewritten as
X(k,l)=xc2xd[A(k,l)+B(k,l)]xe2x80x83xe2x80x83(2.49)
where                                           A            ⁡                          (                              k                ,                l                            )                                =                                    ∑                              m                =                0                                            N                -                1                                      ⁢                                          ∑                                  n                  =                  0                                                  N                  -                  1                                            ⁢                              xe2x80x83                            ⁢                                                x                  ⁡                                      (                                          m                      ,                      n                                        )                                                  ⁢                cos                ⁢                                  xe2x80x83                                ⁢                                                                                                    (                                                                              2                            ⁢                            m                                                    +                          1                                                )                                            ⁢                      k                      ⁢                                              xe2x80x83                                            ⁢                      π                                        +                                                                  (                                                                              2                            ⁢                            n                                                    +                          1                                                )                                            ⁢                      l                      ⁢                                              xe2x80x83                                            ⁢                      π                                                                            2                    ⁢                    N                                                  ⁢                                  xe2x80x83                                ⁢                and                                                    ⁢                  
                ⁢                              B            ⁡                          (                              k                ,                l                            )                                =                                    ∑                              m                =                0                                            N                -                1                                      ⁢                                          ∑                                  n                  =                  0                                                  N                  -                  1                                            ⁢                              xe2x80x83                            ⁢                                                x                  ⁡                                      (                                          m                      ,                      n                                        )                                                  ⁢                cos                ⁢                                  xe2x80x83                                ⁢                                                                                                    (                                                                              2                            ⁢                            m                                                    +                          1                                                )                                            ⁢                      k                      ⁢                                              xe2x80x83                                            ⁢                      π                                        -                                                                  (                                                                              2                            ⁢                            n                                                    +                          1                                                )                                            ⁢                      l                      ⁢                                              xe2x80x83                                            ⁢                      π                                                                            2                    ⁢                    N                                                                                                          (        2.50        )            
After some complicated data reordering and manipulations, Cho and Lee have shown that A(k,l) and B(k,l) can be expressed in terms of N""s 1-D DCTs so that an Nxc3x97N DCT can be obtained from N""s separate 1-D DCTs [CL91].
2.3.4 2-D DCT/IDCT Hardware Implementations
A lot of papers have been written lately on the development of VLSI and chip implementation of the 2-D DCT/IDCT. Most of the works have concentrated on the basic block size of W, because the W block has been found to be able to provide sufficient details and localized activities of the image such that it has been adopted as the standard 2-D DCT/IDCT size in almost all existing image and video processing and compression protocols.
Because of the limitation of areas and interconnections in VLSI implementation, not much of the chip development work has included the mapping of fast, two-dimensional algorithms onto silicon directly. Instead, regularity of design and feasibility of layout seem to be the primary concern, together with a realistic throughput rate for real-time applications. However, there have been attempts to map Lee""s algorithm [Lee84] onto silicon [RY90]. As well, chips based on a single processor rotation [LLM89] are also being reported [RY90]. But all of them are limited to 1-D DCT/IDCT applications.
In practice, the 2-D DCT algorithms based on other discrete transforms suffer the same setbacks as their 1-D counterparts: complex arithmetic operations. complicated conversion between the two different transforms and complex index mapping, which make the hardware implementations via other discrete transforms rather difficult.
Generally speaking, the 2-D DCT algorithms based on direct matrix factorization or decomposition are much more suitable for software implementation, because they usually require fewer multiplications than other approaches and the complex index mapping involved is not a problem for software. The high communication complexity and global interconnection involved in these algorithms make them difficult to be implemented using VLSI technology.
The 2-D DCT algorithms based on the RCM approach, however, can be realized using a very simple and regular structure, since the RCM reduces the 2-D DCT into two stages of 1-D DCTs and the existing 1-D DCT algorithms listed in section 2.2 can be employed directly. The relative simple localized interconnections of the RCM is another key feature making it suitable for VLSI implementation. The block diagram of the xe2x80x9crow-columnxe2x80x9d transform approach to realize an Nxc3x97N 2-D DCT is illustrated in FIG. 8.
Needless to say, variations on this basic block structure are many. Some use special devices for the intermediate memory transposition operation. Some use a single, 1-D DCT processor to perform both row and column transformations one-by-one in order to reduce the die size [MW95]. Others use time-recursive algorithms and architectures to achieve regular and modular structure [SL96, WC95]. Some proposed systolic array architecture of RCM can even avoid using an intermediate matrix transposition circuitry with the extra expense of data synchronization and input sequence reordering [CW95].
It is worth notice that all the chip developments have one common ground. Almost all the 2-D DCT or IDCT processors developed so far have made use of the separability property of the 2-D DCT or IDCT by decomposing it into two separate 1-D transforms. None have attempted to directly map a specific 2-D DCT or IDCT algorithm to silicon.
In this section, various conventional approaches for computing the 1-D DCT have been examined, as well as some of the algorithms designed to implement the 2-D DCT have also been investigated. The 1-D algorithms can be loosely classified as the DCT via other transforms, via sparse matrix factorization and via time-recursive approaches. Similar, the 2-D algorithms can be classified as DCT via other transforms, via direct matrix factorization/decomposition and via Row-Column methods. Both the 1-D IDCT and 2-D IDCT can be computed and implemented with approaches similar to the 1-D DCT and 2-D DCT.
The most prominent property is the separability property of the 2-D DCT or IDCT, which has been exploited both in the algorithms and in the chip designs. Almost all existing 2-D DCT or IDCT processors are based on the reduction of 2-D DCT or IDCT to a lexicographically ordered 1-D transforms (i.e. RCM).
Compared with the Row-Column methods, the direct 2-D DCT or IDCT matrix factorization/decomposition is more computation efficient and generally requires fewer multiplications. But the complex global communication interconnection of existing direct 2-D DCT or IDCT algorithms has prevented them from being implemented in VLSI chips due to design and layout concerns.
The present invention provides a method and system for computing 2-D DCT/IDCT which is easy to implement with VLSI technology to achieve high throughput to meet the requirements of high definition video processing in real time.
The present invention is based on a direct 2-D matrix factorization approach. The present invention computes the 8xc3x978 DCT/IDCT through four 4xc3x974 matrix multiplication sub-blocks. Each sub-block is half the size of the original 8xc3x978 size and therefore requires a much lower number of multiplications. Additionally, each sub-block can be implemented independently with localized interconnection so that parallelism can be exploited and a much higher DCT/IDCT throughput can be achieved.