Fields of technology are Telecommunications, Digital Signal Processing and Compression and Decompression of Image Data.
Structures and processes are provided for encoding and decoding a residual layer of advanced video codec standards such as AVS. An AVS residual encoder is a Chinese video codec standard. In conventional AVS, the order of scan and entropy coding are opposite to each other.
A block diagram of a residual encoder for a video codec is depicted in FIG. 1. A unit size of encoding and decoding a residual layer is called a block as illustrated by block 110 in FIG. 1. Residual coefficients in a Macroblock have zero value (Run) or non-zero value (Level). In some video codec standards, residual coefficients are compressed into a bitstream by the following steps:    (1) Encoder starts scanning 120 residual coefficients in a block 110 of quantized coefficients starting from the DC coefficient and scanning toward AC coefficients. The term “DC coefficient” refers to the single image transform coefficient representing the unvarying component or term in an image transform resulting from electronic computation, and often this term represents the average image intensity of the image to which the image transform is applied. AC coefficients refer to any or all of the image transform coefficients that represent the amplitude or intensity of each spatially varying component or term of the image transform of an image.    (2) During scanning, the encoder counts a number of consecutive Runs between Levels (called a Scan section in following FIG. 1).    (3) If the encoder finds Level, the Level value and number of consecutive Runs (Run-length) are converted into a symbol value. In this step, a conversion table based on Huffman coding theory is applied (called an Entropy Encoder 130 in following FIG. 1).    (4) The symbol value is converted into a bitstream and fed to a stream buffer 140.
In FIG. 2, blocks and a Macroblock and their relationship are illustrated, showing blocks inside a Macroblock. A block has 64 coefficients (8 for horizontal, and 8 for vertical), and a Macroblock has 6 blocks (4 blocks are luminance and 2 blocks are chrominance) in a 4:2:0 format. Then, 64 coefficients/block×6 block/Macroblock=384 coefficients/Macroblock.
Clock cycles for encoding or decoding a Macroblock are distributed among tasks for encoding or decoding a Macroblock in three categories. The first task category is interface control to prepare information for a current Macroblock and a neighbored Macroblock and accomplish data transactions at system level. It can consume around 50 clocks clock. Second is a Macroblock header process that generates a motion vector predictor, and processes syntaxes in the Macroblock header such as electronic computation of motion vector, Macroblock type, CBP (Coded Block Pattern read: on/off bit), and quantization parameters and can consume around 300 clocks. Third is a residual layer process that processes 384 coefficients in each Macroblock and can be quite time consuming beyond available real-time processing budget.
As described hereinabove for FIG. 1, encoding the residual layer starts with scanning coefficients in a block. In order to start entropy coding of a block, all coefficients in the block are scanned to get the Level value and Run-length. A rule or order for scanning coefficients is illustrated in FIG. 3. Thus, FIG. 3 depicts a scanning order of residual coefficients in a block. The method in FIG. 3 is called a Zig-Zag Scan. The Level value and Run-length are integrated into one word, and encoded into a bit-stream according to a particular rule or order adopted for the entropy coding.
In an AVS residual encoder processing flow, the order of scan and entropy coding are opposite to each other according to a method for encoding the residual layer as illustrated in the following FIG. 4.
In conventional AVS encoding, the encoder scans the residual coefficients in a block from DC to AC. This method is called a Zig-Zag scan. The encoder checks non-zero coefficients and number of consecutive zero coefficients before the non-zero coefficient. Here, the non-zero coefficient is called a Level, and number of consecutive zero coefficients is called a Run. When the encoder faces to or encounters a Level, the Level and Run are stored into a buffer memory as in FIG. 4. The reason why the encoder needs to store the Level and the Run is that the entropy coding order (AC-to-DC) is opposite to the DC-to-AC scanning order, which is recognized herein as problematic and emphasized herein by oppositely-directed vertical arrows of FIG. 4. Put another way, the order of scan that constitutes the Run & Level Buffer in FIG. 4 is opposite to the order by which the entropy coder consumes the contents of that Run & Level Buffer. The entropy encoder starts encoding from the last entry of the Run & Level buffer as shown in FIG. 4.
As illustrated in FIGS. 4 and 5, the processing flow of an AVS residual encoder has steps wherein    (1) The encoder starts scanning all coefficients inside a block from DC position to AC (64th) position.    (2) The encoder stores Level and Run-length before the Level when the encoder finds a Level during scanning    (3) After the encoder has finished scanning all coefficients in the block, the encoder starts entropy coding of the block from the last Level toward the first Level. In FIG. 4, ID=13 (Level=1, Run=2) is firstly encoded. Then ID=12, ID=11 . . . and finally ID=0 (Level=18, Run=0) is encoded.
In a first time slot of FIG. 5, the encoder scans block-0 and does not activate entropy coding. In a second slot, the encoder of FIGS. 3A/3B and FIG. 4 scans block-1 and activates entropy coding of block-0, per step (3) of the previous paragraph. Consequently, the encoding pipeline latency is equal to the time period consumed by scanning the previous block. In an AVS residual encoder, the pipeline latency is equal to 64 clocks without any overhead. A functional image of this pipelined architecture is illustrated in FIG. 5 and shows the pipeline latency of AVS.
In order to accomplish the encode in H/W, a ping-pong buffer 335 is provided in FIG. 6. The encoder stores Level and Run-length, as illustrated in FIG. 4, in the ping-pong buffer 335 of FIG. 6. A block diagram of the encoder with two ping-pong buffers 335.1 for parallelized data transfer in and 335.0 for data transfer out is illustrated in FIG. 6. Each buffer 335.0 and 335.1 is capable of storing 64 coefficients. Because maximum Run-length is 63 (defined by 6 bits), each buffer has a capacity of at least 1408 bits, wherein 64 coefficients times the sum of 16 bits/coefficient (Level) plus 6 bits/coefficient (Run) is 1408 bits. Total ping-pong buffer size for the parallelized operation of FIG. 6 is therefore 2 buffers times 1408 bits/buffer, or a total of 2816 bits. Thus, almost 3 Kbits of memory area is involved to do AVS in the way just described.
Existence of buffer memory and pipeline latency are problematic and disadvantageous from the standpoints of performance, power and area. A buffer memory of 3 Kbit consumes electric power, and respective control logic must also be provided for each buffer. As illustrated in FIG. 5, processing time is equal to that of encoding 7 blocks.
Note in FIG. 4 that the scanning order and coding order of residual layer are opposite to each other. In this conventional approach, the encoder firstly scans coefficients in block-0 and results are stored into a buffer-A 210 in the Buffer Area 335.1 of FIG. 6. After the encoder finishes scanning (320) into the block-0, the encoder starts scanning (320) into block-1 (335.1) and does entropy coding (330) of block-0 (335.0) that was stored in buffer-A. The result of scanning block-1 is stored in buffer-B in the Buffer Area of FIG. 6. Scan 320 sends Encoder core 330 number of level coefficients per each block and a CBP value of the block. In this manner, all blocks from block-0 to block-5 in a Macroblock are encoded in FIG. 5. Thus, a pair of space-consuming buffers not only expend chip real estate but also introduce a pipeline latency of 1 block—a pipeline latency of e.g., 70 clocks and keeping encode total clock cycles unacceptably high.
Accordingly, some ways of providing improved encoders and decoders, processes and systems would be very desirable in the art.