FIG. 1 shows an illustrative image compression circuit 10 according to JPEG recommendations. A digital image to be compressed is inputted to a discrete cosine transform circuit (DCT) 12 which outputs blocks of DCT transformed coefficients (herein "DCT coefficients" refers to the transformed image data). Illustratively, each block is an eight by eight matrix of DCT coefficients. The DCT coefficient blocks are received at a quantizer 14 which outputs blocks of quantized coefficients.
Each block of quantized coefficients is received at a zig-zag RAM 16. The coefficients are stored in the zig-zag RAM 16 according to a zig-zag scheme such as the zig-zag scheme shown in FIG. 2. To that end, a zig-zag address generator 15 may be provided to cause the coefficients of each block to be stored as in the scheme depicted in FIG. 2. FIG. 3 is a timing diagram illustrating the input and output of the zig-zag RAM, 16. As shown, a first block is written into the zig-zag RAM 16 according to a zig-zag scheme during a first time period t=0 to T. Then, the quantized DCT coefficients of the block are sequentially read out, e.g., row by row during a second subsequent time period t=T to 2T. A counter 17 may be provided for generating sequential addresses for sequentially shifting out the quantized DCT coefficients. After the block is shifted out, the zig-zag RAM 16 is prepared to write the next block therein during a third subsequent time period t=2T to 3T.
The quantized DCT coefficients are then variable length encoded using encoder circuits 27. Quantized coefficients can be classified into four types of coefficients: DC coefficients, AC coefficients equal to zero, non-zero AC coefficients with zero run length, and non-zero AC coefficients without zero run length. The encoding of a coefficient depends on its type. In addition, an end of block (EOB) delimiter is inserted after each block which EOB delimiter is processed by the encoder circuits 27.
DC coefficients are received at a differential pulse code modulator (DPCM) 18 which converts each DC coefficient into a variable length integer (VLI). The VLI is inputted to a Huffman encoder 20 which outputs a variable length code word (VLC). The VLI is also inputted to, and stored, in a latch 24. The VLC, when outputted, is stored in a latch 22.
As shown, the AC coefficients are inputted as a VLI to a latch 32 and to a zero run length counter 26. The zero run length counter counts the "zero run length" or total number of AC coefficients equal to zero in a continuous run or subsequence of zeros in the block. Alternatively, if the inputted AC coefficient is not equal to zero, the zero run length counter 26 simply outputs the inputted non-zero AC coefficient. The zero run length (ZRL) or the non-zero AC coefficient are then inputted to a Huffman encoder 28. The Huffman encoder 28 outputs a VLC which is stored in the latch 30.
A multiplexer 34 is provided which selects either the DC-VLI, DC-VLC pair stored in the latches 22 and 24 or the AC-VLI, AC-VLC pair stored in the latches 30 and 32. The multiplexer 34 then outputs the selected VLI,VLC pair in sequence (VLC followed by VLI) to a barrel shifter-FIFO circuit 36. The barred shifter-FIFO circuit 36 then shifts out the VLI-VLC pairs.
Analysis of several images compressed by the circuit 10 reveals the following average number of each type of processing states for the encoder circuits 27 for blocks containing sixty-four coefficients each:
TABLE 1 ______________________________________ DC zero AC non-zero AC ZRL EOB ______________________________________ count 1 50.24 11.78 2.11 1 cycles 4 2 9 12 4 ______________________________________
Table 1 also shows the number of cycles used to process a coefficient or EOB delimiter in each state. Thus, the average execution time=1.4+50.2+(12-2).9+2.12+1.4=222 cycles. Thus, if the circuit 10 is incorporated into an IC chip with a 20 Mhz clock, then 11 frames having a size 512.times.512.times.24 bits can be compressed each second.
The architecture of the circuit 10 has three disadvantages:
(1) The zig-zag RAM 16 reduces the band-width of the circuit 10 by 1/2. This is illustrated in the timing diagram of FIG. 3. Between t=0 and t=T, a first block is written in the zig-zag RAM 16. However, between times t=T and t=2T, the second block is not written in the zig-zag RAM 16. Rather, the first block is read out of the zig-zag RAM 16. The second block is not written into the zig-zag RAM 16 until after t=2T and is not read out until after t=3T. Thus, even though a quantizer 14 can be designed to output a data block every t=T cycles, the data blocks can only be processed by (and outputted from) the zig-zag RAM 16 every t=2T cycles. PA0 (2) A great deal, i.e., almost 1/2 of the processing time of the encoder circuit 27, is utilized in processing zero coefficients. The processing of zero coefficients delays the processing of non-zero coefficients and therefore reduces the throughput of the circuit 10. PA0 (3) The processing of each non-zero coefficient produces a VLI,VLC pair. The VLI and VLC of each pair must be entered in sequence into the barrel shifter FIFO circuit 36 so that they are adjacent to each other. Typically, the VLI portion is computed much faster than the VLC portion. Thus, the circuit 10 utilizes latches for storing each portion of a pair as it is computed. However, this architecture delays the computation of a subsequent coefficient until the VLI,VLC pair of a preceding coefficient is inputted to the barrel shifter FIFO circuit 36. Thus, the throughput of the circuit 10 is reduced. PA0 (1) The ping-pong zig-zag RAM doubles the band width of the zig-zag processing. PA0 (2) The zig-zag FIFO predetermines zero run lengths in each block of coefficients prior to encoding thereby decreasing coding processing time by up to 67%. PA0 (3) The VLI,VLC mixer permits fully pipelined operation without increasing the latency of the encoding of DC and AC coefficients.
It is therefore an object of the present invention to overcome the disadvantages of the prior art.