In most of recent video encoding methods, each frame is divided into small areas, and a differential image based on a predicted image is subjected to orthogonal transformation, quantization, and then entropy encoding, thereby compressing video data.
In H.264 video coding standard (see Non-Patent Document 1) as a current mainstream video coding format, not only a context-adaptive variable length coding (“CAVLC”) method for performing entropy encoding by referring to a table, but also a context-adaptive binary arithmetic coding (“CABAC”) method which can further improve the encoding efficiency can be selected.
The above CABAC is a coding method which can compress a stationary signal to a logical limit, and thus is an essential technique for highly efficient encoding. However, in comparison with CAVLC, the computation cost of CABAC is very high (see Non-Patent Document 2).
When encoding a video image and generating a stream which may be distributed on a network having a limited transmission band, it is necessary to produce a constant amount of generated code per unit time so as not to exceed the limited band. Generally, rate control for controlling the amount of generated code by varying the quantization step size (“Qstep”) is executed.
For example, each encoding target block is encoded; the corresponding amount of generated code is computed; and Qstep of the next block is adjusted based on the computed result, thereby keeping a constant amount of generated code.
When using CABAC, a considerable amount of time is required for obtaining the amount of generated code, which increases a delay in encoding. In a known proposed method for reducing the delay, the relationship between Qstep and the amount of generated code is approximated using a function, so as to estimate the amount of generated code (see Patent Document 1).
However, using an approximate function produces a variation in measurement accuracy depending on each video image. In order to perform estimation with an improved accuracy, CAVLC having a smaller computation cost than CABAC may be used for estimating the amount of code (i.e., code amount estimation). In such a case, a result obtained by performing variable length encoding is used, and thus a higher code amount estimation can be executed.
FIGS. 7A and 7B show a flowchart of an encoding operation by which CAVLC can be used for code amount estimation of CABAC. Here, FIG. 7A shows a main routine, and FIG. B shows a CABAC process.
First, the main routine (steps S101 to S111) in FIG. 7A will be explained.
The inter prediction mode and the intra prediction mode are first determined (see steps S101 and S102).
Next, the prediction mode is determined by performing intra/inter determination (see step S103), and a prediction residual is computed for the determined mode (see step S104) and is subjected to DCT (see step S105).
Quantization is applied to DCT transform coefficients by using a supplied Qstep (see step S106).
The quantized transform coefficients are arranged in a one-dimensional form, and coefficient information is supplied to a CABAC computation unit. Simultaneously, code amount estimation is performed based on the coefficient information (pre-encoding process) (see step S107).
The quantized coefficients are also subjected to inverse quantization (see step S108) and IDCT (see step S109), and are then added to a predicted image, thereby generating a decoded image (see step S110).
Finally, the decoded image is subjected to a filtering process (see step S111).
Next, the CABAC process (see steps S121 to S125) in FIG. 7B will be explained.
First, reception of coefficient information generated in the pre-encoding process (S107) is being awaited (see steps S121 to S122). When the relevant data is received, a CABAC step is performed (see step S123), and a generated stream is transmitted (see step S124). Finally, the amount of generated code is sent to a code amount controller (see step S125).
FIG. 8 shows an example of the structure for implementing the above operation.
The shown apparatus has an inter prediction mode determination unit 101, an intra prediction mode determination unit 102, a prediction mode selector 103, a switch 104, a subtractor 105, a DCT unit 106, a quantizer 107, a code amount controller 108, a pre-encoding processor 109, an entropy encoder 110, an inverse quantizer 111, an IDCT unit 112, an adder 113, a decoded image storage buffer 114, a filter 115, and a reference image storage buffer 116.
The inter prediction mode determination unit 101 performs motion-compensated prediction using a reference image in the reference image storage buffer 116, determines the inter prediction mode, sends prediction mode information to the prediction mode selector 103, and also sends a predicted image to the switch 104.
The intra prediction mode determination unit 102 determines the intra prediction mode by using a decoded image in the decoded image storage buffer 114, sends prediction mode information to the prediction mode selector 103, and also sends a predicted image to the switch 104.
The prediction mode selector 103 determines the prediction mode, and selects one of the intra prediction mode and the inter prediction mode by sending a control signal to the switch 104.
Based on the control signal from the prediction mode selector 103, the switch 104 selects one of an inter predicted image sent from the inter prediction mode determination unit 101 and an intra predicted image sent from the intra prediction mode determination unit 102.
The subtractor 105 generates a predicted residual image by computing the difference between an original image and a predicted image, and sends the generated image to the DCT unit 106.
The DCT unit 106 applies DCT transform to the sent predicted residual image, and sends the image to the quantizer 107.
The quantizer 107 performs quantization of the DCT transform coefficients by using the quantization step size Qstep sent from the code amount controller 108, and sends the quantized result to the pre-encoding processor 109 and the inverse quantizer 111.
Based on an estimated amount of code (estimated code amount) sent from the pre-encoding processor 109, the code amount controller 108 computes Qstep of the next macroblock, and sends the computed Qstep to the quantizer 107 and the inverse quantizer 111. The code amount controller 108 also receives the amount of generated code sent from the entropy encoder 110, and corrects the difference from the estimated amount of code.
The pre-encoding processor 109 computes the estimated amount of code based on the quantized DCT coefficients sent from the quantizer 107, and sends the computed value to the code amount controller 108. The pre-encoding processor 109 also generates coefficient information by arranging the quantized DCT coefficients (two-dimensional data) in a one-dimensional form, and sends the generated information to the entropy encoder 110.
The entropy encoder 110 encodes the coefficient information, which is sent from the pre-encoding processor 109, by means of CABAC, and outputs the encoded data as an encoded stream.
The inverse quantizer 111 performs inverse quantization by multiplying the relevant quantized value by Qstep, and sends the result to the IDCT unit 112.
The IDCT unit 112 applies IDCT to the received data, and sent the result to the adder 113.
The adder 113 adds the predicted residual image sent from the IDCT unit 112 to the predicted image sent from the switch 104, and sends the result as a decoded image to the decoded image storage buffer 114.
The decoded image storage buffer 114 stores the decoded image sent from the adder 113, and sends the image to the filter 115. The decoded image storage buffer 114 also sends adjacent pixel information to the intra prediction mode determination unit 102.
The filter 115 applies a filtering process to the decoded image stored in the decoded image storage buffer 114, and sends the filtered image to the reference image storage buffer 116.
The reference image storage buffer 116 stores the filtered decoded image, and sends the image as a reference image to the inter prediction mode determination unit 101.
In accordance with the above functions, the operation shown in FIGS. 7A and 7B is implemented.
Below, the pre-encoding processor 109, to which the present invention can be applied, will be explained.
The pre-encoding processor 109 arranges the two-dimensional data of the quantized DCT coefficients in a one-dimension form, generates coefficient information, sends the information to the entropy encoder 110, and estimates the amount of code by referring to a table.
First, the method of generating coefficient information from two-dimensional data will be explained.
In an example in which the DCT coefficients have a 4×4 block form, the coefficients are arranged in a one-dimensional form in the order as shown in FIG. 9, and the coefficient values are sequentially examined from the 0-th coefficient so that the number of successive coefficients having a value of 0 and the coefficient (non-zero coefficient) which follows the coefficients and has a value other than 0 are stored as a set. Here, the number of successive “0” coefficients is called Run and the coefficient other than 0 is called Level. Such an operation of scanning the coefficient values in a zigzag form so as to arrange them in a one-dimensional form and convert them into Run-Level data is called “zigzag scanning”.
A specific example is shown in FIG. 10, where no “0” exists before coefficients “5” and “3”, and 0 (as Run) is assigned to them.
Additionally, in the table reference in H.264, not only Run and Level, but also (i) the number of the non-zero coefficients and (ii) the number of final succession of “1” or “−1” coefficients and the relevant sign are necessary. Based on the necessary data, the amount of code is estimated by referring to a table. In addition, the Run-Level information is encoded by means of arithmetic encoding.
FIG. 11 shows an example of a flowchart of the above operation.
First, zigzag scanning of the relevant 4×4 block is performed, and the Run-Level sets are obtained (see step S151). The results are sent to the entropy encoder 110 (see step S152).
For the obtained Run-Level sets, the number of non-zero coefficients, the number of final succession of “1” or “−1” coefficients, and the positive or negative sign therefor are determined (see step S153), and the relevant amount of code is computed using a variable-length coding table (called a “VLC table”) (see step S154).
The computed amount of code is sent as an estimated amount of code (estimated code amount) to the code amount controller 108 (see step S155).
FIG. 12 shows a flowchart of zigzag scanning.
First, counters i and n are each initialized at 0 (see step S201). Additionally, the variable “run” is also initialized at 0 (see step S202).
Next, coordinates S_i(x, y) of the i-th coefficient in scanning are obtained by referring to a table, and the coefficient value at the obtained coordinates is stored into k[i] (see step S204). In an example of processing a 4×4 block, the coefficients are subsequently input into k[i] in the order shown in FIG. 9.
If k[i]=0 (see step S205), run is incremented by 1 (see step S206), and i is also incremented by 1 (see step S209).
If k[i] is not zero (see step S205), the value of run is stored in Run[n] for storing Run information, and the non-zero coefficient k[i] is stored in Level[n] for storing Level information (see step S207). Then i is incremented by 1 (see step S209).
When the scanning has reached the last coefficient, the operation is completed (see step S210). When the scanning has not yet reached the last coefficient, the above process from step S203 to S210 is repeated.
In accordance with the above operation, the Run-Level sets can be obtained by means of zigzag scanning.
FIG. 13 shows an example of the structure of the pre-encoding processor 109 in FIG. 8.
The structure includes a quantized value storage buffer 201, a run counter 202, a pre-encoding process controller 203, a 4×4 scanning counter 204, a 4×4 scanning order reference table 205, a Run-Level information storage buffer 206, a code amount estimation controller 207, a code amount estimation unit 208, and a VLC table storage memory 209.
The quantized value storage buffer 201 stores the quantized (values of) DCT coefficients. When receiving coordinate information from the 4×4 scanning order reference table 205, the quantized value storage buffer 201 sends the quantized value corresponding to the relevant coordinates to the run counter 202. When the quantized value is received, the quantized value storage buffer 201 sends an operation start signal to the pre-encoding process controller 203.
The run counter 202 stores variable “run” and receives the quantized value from the quantized value storage buffer 201. When the received quantized value is 0, the run counter 202 increments run by 1. When the received quantized value is not 0, the run counter 202 sends the relevant coefficient and the currently-stored Run to the Run-Level information storage buffer 206 as Run-Level information, and resets run to 0. The run counter 202 also resets run to 0 when receiving a reset signal from the pre-encoding process controller 203.
When the pre-encoding process controller 203 receives a start signal from the quantized value storage buffer 201, the pre-encoding process controller 203 sends a reset signal to the run counter 202 and the Run-Level information storage buffer 206 so as to reset them, and then sends an operation start signal to the 4×4 scanning counter 204. In addition, when receiving an end signal from 4×4 scanning counter 204, the pre-encoding process controller 203 sends an estimation start signal to the code amount estimation controller 207.
When receiving the operation start signal from the pre-encoding process controller 203, the 4×4 scanning counter 204 sequentially sends numeric values from 0 to 15 to the 4×4 scanning order reference table 205. When the last “15” has been sent, the 4×4 scanning counter 204 sends an end signal to the pre-encoding process controller 203.
The 4×4 scanning order reference table 205 receives coordinates corresponding to the numeric values from the 4×4 scanning counter 204, and sends the coordinates to the quantized value storage buffer 201.
When receiving Run-Level information from the run counter 202, the Run-Level information storage buffer 206 stores the information, and sends it to the code amount estimation unit 208 in accordance with a control signal from the code amount estimation controller 207. The Run-Level information storage buffer 206 also sends the Run-Level information to the entropy encoder 110. Additionally, when receiving a reset signal from the pre-encoding process controller 203, the Run-Level information storage buffer 206 clears the contents of the buffer.
When the code amount estimation controller 207 receives an estimation start signal from the pre-encoding process controller 203, the code amount estimation controller 207 sends an estimation start signal to the code amount estimation unit 208, and also sends a control signal to the Run-Level information storage buffer 206 so as to send Run-Level information to the code amount estimation unit 208.
When receiving the estimation start signal from the code amount estimation controller 207, the code amount estimation unit 208 receives VLC information from the VLC table storage memory 209 based on the Run-Level information sent from the Run-Level information storage buffer 206, and estimates and outputs an amount of code.
The VLC table storage memory 209 stores a VLC table, and sends it as the VLC information to the code amount estimation unit 208.
In accordance with the above structure, the operation as shown in FIG. 11 can be implemented.    Non-Patent Document 1: Sakae Okubo, Shinya Kadono, Yoshihiro Kikuchi, and Teruhiko Suzuki, “H.264/AVC TEXTBOOK”, Impress, pp. 144-146, 2004    Non-Patent Document 2: CABAC: Detlev Marpe, Heiko Schwarz, Thomas Wiegand, “Context-Based Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard”, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, Vol. 13, No. 7, pp. 620-636, July 2003    Patent Document 1: Japanese Unexamined Patent Application, First Publication No. H07-264579