The present invention relates generally to methods and apparatus for data compression and decompression, and in particular, is directed to methods and systems for performing the discrete cosine transform (DCT) and weighting processes of a digital video encoder and the inverse weighting and inverse discrete cosine transform (IDCT) processes of a digital video decoder.
Digital video is the term used to describe video signals represented in digital form. Digital video offers several advantages over traditional analog systems. For example, recordings of digital video signals can be copied indefinitely without significant loss of quality. Also, compressed digital video signals may require less storage space than analog recordings for the same or better picture quality. Finally, digital formats allow audio, video, and other data to be easily combined, edited, stored, and transmitted.
A direct conversion into digital format of analog video signals having fast frame rates, many colors, and high resolution, however, results in digital video signals with a high data rate, creating difficulties for storage and transmission. Many digital video systems, therefore, reduce the amount of digital video data by employing data compression techniques that are optimized for particular applications. Digital compression devices are commonly referred to as encoders; devices that perform decompression are referred to as decoders. Devices that perform both encoding and decoding are referred to as codecs.
The "DV" format is an industry digital video format specification for use primarily in consumer-level video tape recorders VTRs). The specification for DV format has been adopted by most of the major manufacturers of high-quality digital video cassette recorders (DVCRs) and digital video camcorders. See Specifications of Consumer-Use Digital VCRs, HD Digital VCR Conference, December 1994. The DV format is currently used in such commercially available products as digital camcorders.
Video displays have traditionally consisted of a series of still pictures, or "frames", painted by scan lines and sequentially displayed at a rate of, for example, thirty frames per second to provide the illusion of continuous motion. Each frame consists of a pair of interlaced "fields." A field contains half the number of lines of a frame. Fields are interleaved with lines from either a previous or subsequent field to create a frame. When storing or transmitting video data, the amount of data may be reduced by taking advantage of redundancy within fields (intrafield) or between neighboring fields (interfield). DV format uses both intrafield and interfield data reduction.
FIG. 1 is a basic flow diagram showing the encoding, or data compression, process of a prior art digital video codec. Codecs employing DV format use a DCT-based data compression method. In the blocking step, the image frame is divided into N by N blocks of pixel information including, for example, brightness and color data for each pixel (Step 100). A common block size is eight pixels horizontally by eight pixels vertically. The pixel blocks are then "shuffled" so that several blocks from different portions of the image are grouped together (Step 110). Shuffling enhances the uniformity of image quality.
Different fields are recorded at different time incidents. If a video scene contains a large amount of motion, the two fields within a frame contain significantly different image information, and DV encoders use an intrafield data reduction process to remove redundancy within a field.
In video images without substantial motion, the two fields of a frame contain similar image information, and DV encoders use an interfield data reduction process to remove redundancy between fields. For each block of pixel data, a motion detector looks for the difference between two fields of a frame (Step 115). The motion information is sent to the next processing step (Step 120).
In step 120, pixel information is transformed using a DCT. There are at least two common DCT modes: 8--8 DCT mode and 2-4-8 DCT mode. The 8--8 DCT mode refers to a DCT that takes eight inputs and returns 8 outputs in both vertical and horizontal directions. In the 2-4-8 DCT mode, an 8 by 8 block of data is divided into two 4 by 8 fields, each field consisting of 4 horizontal lines of 8 components. A two-dimensional 4 by 8 transform is performed on each field, each 4.times.8 transform consisting of a one-dimensional transform taking 4 inputs and returning 4 outputs in the vertical direction, and a one-dimensional transform taking 8 inputs and returning 8 outputs in the horizontal direction. The DV format specification recommends that the 8--8 DCT mode be used when the difference between two fields is small. By contrast, the 24-8 DCT mode should be used when two fields differ greatly.
In the 2-4-8 DCT mode, 8.times.8 blocks of pixel information are divided into two 4.times.8 blocks of pixel information. The first block represents the sums the rows; the second block represents the differences of the rows. Each 4.times.8 block is transformed into a 4.times.8 matrix of corresponding frequency coefficients using a two-dimensional DCT. In the following equations, P(x,y) represents an input block of pixel information with symbols x and y representing pixel coordinates in the DCT block. Q'(h,v) represents the resulting output block of DCT coefficients for the sum information and Q'(h,v+4) represents the output block of DCT coefficients on the difference information. The DCT in the 2.times.4.times.8 mode may be described mathematically as follows:
For h=0, 1, . . . 7 and v=0, 1, . . . .sub.3, ##EQU1##
The DCT coefficients are then weighted by multiplying each block of DCT coefficients by weighting constants (Step 124). This process may be described mathematically as follows: EQU Q (h, v)=W (h, v) Q'(h, v)
The following weighting coefficients are standard for the DV format. ##EQU2##
re w(O)=1
w (1)=CS4/(4.times.CS7.times.CS2) PA1 w (2)=CS4/(2.times.CS6) PA1 w (3)=1/(2.times.CS5) PA1 W(4)=7/8 PA1 w(5)=CS4/CS3 PA1 w (6)=CS4/CS2 PA1 w (7) =CS4/CS1 and CSm =COS (mII/16). PA1 w (0)=1 PA1 w (1)=CS4/ (4.times.CS7.times.CS2) PA1 w (2)=CS4/ (2.times.CS6) PA1 w (3)=11(2.times.CS5) PA1 w (4)=7/8 PA1 w (5)=CS4/CS3 PA1 w (6)=CS4/CS2 PA1 w (7)=CS4/CS1 PA1 where CSm=COS (mII/16).
The weighted DCT coefficients, Q(h,v), are stored to a buffer (Step 125).
The weighted DCT coefficients are quantized in the next step (Step 140). Quantization increases the efficiency of video data transmission, but may result in error propagation. To reduce the magnitude of errors, each DCT block is classified into one of four activity classes described in the DV format specification (Step 130). The four classes represent four different quantizing schemes. The amount of data in the variable length codeword using each quantizer is estimated (Step 135) and the quanfizer that best will compress one or more successive weighted DCT coefficients into a same size block as a synchronization block is selected.
Quantization rounds off each DCT coefficient within a certain range of values to be the same number (Step 140). Quantizing tends to set the higher frequency components of the frequency matrix to zero, resulting in much less data to be stored. Since the human eye is most sensitive to lower frequencies, however, very little perceptible image quality is lost by this step.
Quantization step 140 includes converting the two-dimensional matrix of quantized coefficients to a one-dimensional linear stream of data by reading the matrix values in a zigzag pattern and dividing the one-dimensional linear stream of quantized coefficients into segments, where each segment consists of a string of zero coefficients followed by a non-zero quantized coefficient. Variable length coding (VLC) then is performed by transforming each segment, consisting of the number of zero coefficients and the amplitude of the non-zero coefficient in the segment, into a variable length codeword (Step 145). Finally, a framing process packs every 30 blocks of variable-length coded quantized coefficients into five fixed-length synchronization blocks (Step 150).
FIG. 2 shows a basic flow diagram of a prior art DV codec decoding process. Decoding is essentially the reverse of the encoding process described above. The digital stream is first deframed (Step 200). Variable length decoding (VLD) then unpacks the data so that it may be restored to the individual coefficients (Step 210).
After inverse quantizing the coefficients (Step 220), inverse weighting (Step 230) and an inverse discrete cosine transform (IDCT) (Step 235) are applied to the result. The inverse weights W(h,v) are the multiplicative inverses of the weights W'(h,v) that were applied in the encoding process. The inverse weighting process may be described mathematically as follows, where Q(h,v) represents the input coefficients. EQU Q'h, v)=W(h,v) Q (h, v)
The following inverse weighting coefficients are standard for the DV format. ##EQU3##
where
The output, Q'(h,v), of the inverse weighting function is then processed by the IDCT. The IDCT process is described mathematically as follows:
If y=2m, then for m=0, 1 . . . 3, and x=0, 1, . . . 7, ##EQU4##
Also, if y=2m+1, then for m=0, 1 . . . 3, and x =0, 1, . . . 7, ##EQU5##
The result is then deshuffled (Step 240) and deblocked (Step 250) to form the full image frame.
There has been much emphasis on producing efficient implementations of DCT-based data compression and decompression techniques. U.S. Pat. No. 4,385,363 to Widergren et al., for example, discloses a DCT processor for transforming 16 by 16 pixel blocks. The five-stage process described in the '363 patent is optimized for a hardware implementation using 16 inputs and is not readily adaptable for four inputs, nor is it efficient when implemented using software.
U.S. Pat. No. 5,574,661 to Cismas describes an apparatus and method for calculating an inverse discrete cosine transform. The '661 patent discloses primarily a hardware implementation in which the process of performing the IDCT on 8 inputs requires only one multiplier. Neither the '363 patent nor the '661 patent discuss combining the weighting and transform processes. Furthermore, neither of these patents are easily extensible to applications intended for the DV format.
Some researchers have suggested that combining the weighting and DCT processes together reduces the number of multiplications, thereby increasing the speed of most software and hardware implementations of encoders and decoders. See, e.g., C. Yamamitsu et al., "An Experimental Study for Home-Use Digital VTR," IEEE Transactions on Consumer Electronics, Vol. 35, No. 3, August 1989, pp. 450-456. Yamamitsu et al. discusses combining a sample weighting function with DCT and IDCT processes to produce a "modified DCT" and "modified IDCT," each of which requires fewer multiplications than its two component functions when performed consecutively. The modified DCT and IDCT disclosed in the Yamamitsu et al. paper, however, do not use the standard weights for the DV format.
The traditional way to implement 2-4-8 DCT and weighting or inverse weighting and 2-4-8 IDCT is to use two steps. The weighting and inverse weighting processes each require three multiplications in the vertical direction and seven multiplications in the horizontal direction mul. The 4-point DCT or 4-point IDCT each requires three multiplications. Each 8-point DCT or 8-point IDCT requires eleven multiplications. See, e.g., C. Loeffler et al., "Practical Fast l-D DCT Algorithms with 11 Multiplications," Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-89, pp. 988-991, 1989.
The present invention reduces DCT-based compression time and decompression time by providing methods and apparatus for a combined weighting/DCT function of a DV encoder that minimize the number of multiplications. Finally, the present invention provides methods and apparatus for a combined IDCT/inverse weighting function of a DV decoder that minimize the number of multiplications.