In video encoding methods, an enormous amount of information of an original signal is compressed by omitting redundancies in temporal and spatial directions. Specifically, a technique of motion compensation for taking a difference between preceding and succeeding frames by using a motion vector, and a technique of an orthogonal transform for transforming a plane where pixels are distributed on a screen, namely, in horizontal and vertical directions into frequency components, and a technique of rounding an orthogonal transform coefficient to a representative value with quantization are respectively adopted for temporal and spatial directions. Moreover, variable-length encoding (entropy encoding) is used as a technique of arithmetic information compression.
With conventional video encoding methods adopting, especially, motion vector compensation, encoding is fundamentally performed in processing units of MBs (Macroblocks) of 16×16 pixels. However, encoding in units of blocks of 8×8 pixels is enabled with the encoding methods such as H.263 and MPEG-4. With the latest video encoding of H.264/AVC (Advanced Video Coding), the number of divisions further increases to 16×16, 16×8, 8×16 and 8×8, and blocks of 8×8 pixels are further divided into sub-blocks of 8×8, 8×4, 4×8 and 4×4.
Conventionally, not only in information compression using motion compensation in a temporal direction but also, for example, in an orthogonal transform, DCT (Discrete Cosine Transform) only in units of 8×8 pixels is implemented. However, with H.264/AVC, switching can be made between the processing units of 4×4 and 8×8 for each macroblock although this switching is limited to a profile higher than a high profile.
FIG. 1 is a block diagram illustrating a configuration example of functional blocks of a video encoding apparatus (sometimes referred to as an encoder) for implementing the above described video encoding method.
As illustrated in FIG. 1, the functional blocks of the video encoding apparatus includes a frame memory 11, an original image macroblock buffer 12, a reference block buffer 13, a motion vector searching unit 21, a prediction determining unit 22, a subtractor 31, a first switch 32, an orthogonal transform (DCT) unit 33, a quantization (Q) unit 34, a variable-length encoding (ENT) unit 51, an inverse quantization (IQ) unit 44, an inverse orthogonal transform (IDCT) unit 43, a second switch 42, and an adder 41.
The frame memory 11 stores past and future images in order to make motion estimation.
The original image macroblock buffer 12 stores macroblocks of an original frame to be encoded of each frame stored in the frame memory 11, whereas the reference block buffer 13 stores reference blocks for the macroblocks of the original frame.
The motion vector searching unit 21 searches for a motion vector by using the macroblocks of the original frame and their reference blocks.
The prediction determining unit 22 evaluates motion estimation for all of division shapes of a macroblock illustrated in FIG. 3A to decide a division shape, and determines whether encoding is to be performed either with inter-frame prediction or with intra-frame prediction.
The subtractor 31 calculates a difference between a macroblock and a predicted macroblock.
Switching is made between the first switch 32 and the second switch 42 depending on whether encoding is performed either with inter-frame prediction or with intra-frame prediction.
The orthogonal transform (DCT) unit 33 obtains an orthogonal transform coefficient by performing an orthogonal transform (such as DCT) for image data the information of which is compressed in a temporal direction, and compresses the information in a spatial direction.
The quantization (Q) unit 34 quantizes the orthogonal transform coefficient, and the variable-length encoding (ENT) unit 51 outputs an encoding output by further performing arithmetic compression for the information.
The inverse quantization (IQ) unit 44 obtains an original orthogonal transform coefficient by performing inverse quantization for the quantized orthogonal transform coefficient. The inverse orthogonal transform (IDCT) unit 43 restores data before being orthogonal-transformed from the orthogonal transform coefficient by performing an inverse orthogonal transform.
The adder 41 restores an original image by adding predicted image data to difference data that is the output of the inverse orthogonal transform (IDCT) unit 43 if encoding is performed with inter-frame prediction.
FIG. 2 is a flowchart of a macroblock process executed in the conventional example.
The flow of FIG. 2 illustrates the process according to the order where items of information that are generated by processing a macroblock with the encoder are set and transmitted as encoding information. This order conforms to that laid down as decoding syntax elements in H.264. Table 1 to be provided later is a syntax table of a macroblock layer and its lower-level layers in H.264.
Initially, macroblock type information is set as the initial item of the encoding information in step S21. This information includes information indicating whether encoding is performed either with inter-frame prediction or with intra-frame prediction, and information about the division shape of a macroblock. As the next item of the encoding information, motion vector information is set in step S22. Since the division shape of a macroblock varies depending on the type of the macroblock, the motion vector information is set by the number of divisions as indicated by step S23.
Next, a quantization parameter value is set in the encoding information in step S24. This value is set for each macroblock.
Then, a flag indicating whether an orthogonal transform is performed in units of either 8×8 or 4×4 is set as orthogonal transform information in the encoding information in step S25.
Lastly, in step S26, a coefficient after being orthogonal-transformed in units of 8×8 or 4×4 is obtained, and transform coefficient information obtained by quantizing the coefficient with the quantization parameter set in step S24 is generated and transmitted in units of sub-blocks. This process is repeated by the number of divisions as indicated by step S27. At this time, a flag cbp (coded block pattern) indicating validity/invalidity for each sub-block is set after the motion vector information and before the quantization parameter information within the encoding information. Only the valid coefficient information of a sub-block, which is indicated by the flag, is transmitted.
FIGS. 3A to 3C are explanatory views of conventional macroblock divisions in video encoding. FIG. 3A is an explanatory view of dividing a macroblock in motion estimation. As illustrated in this figure, the macroblock can be divided into 16×16, 16×8, 8×16 and 8×8, and the divided portions of 8×8 can be further divided into 8×4, 4×8 and 4×4.
FIG. 3B is an explanatory view of dividing a macroblock in an orthogonal transform. As illustrated in this figure, the macroblock can be divided into blocks of 8×8 and 4×4.
FIG. 3C illustrates the case of quantization. As illustrated in this figure, quantization is performed in units of 16×16.
The encoding process is further described next with reference to FIGS. 1 and 3A to 3C.
Motion estimations in all of divisions of 16×16, 16×8, 8×16 and 8×8 illustrated in FIG. 3A are evaluated by the prediction determining unit 22 illustrated in FIG. 1, and a prediction mode (macroblock type) is decided by determining the most efficient way of divisions and whichever of inter-frame prediction and intra-frame prediction is to be selected.
Next, the size (orthogonal transform information) of a block to be orthogonal-transformed (DCT) is decided depending on whichever units of 8×8 and 4×4 illustrated in FIG. 3B as the units of the orthogonal transform further reduces the number of prediction errors, and the orthogonal transform (DCT) unit 33 performs the orthogonal transform process.
Then, the quantization (Q) unit 34 rounds a transformed coefficient to a representative value by using a quantization parameter value decided from the viewpoint of distributing the amount of information, and transmits a remaining valid coefficient of non-zero. At this time, the flag cbp indicating whether or not a valid coefficient exists among quantization coefficient values in units of sub-blocks is calculated. Then, the flag cbp, and only the quantization coefficient information of a sub-block having a valid coefficient indicated by the flag cbp are transmitted as encoding information.
Patent Documents 1 to 4 related to video encoding technology are introduced next.
Patent Document 1 particularly refers to the prediction encoding technique used within a screen. Patent Document 2 particularly refers to the division of a macroblock with an arbitrary line segment. Patent Document 3 particularly refers to the technique of quick re-encoding when an encoding method is converted. Patent Document 4 refers to the technique of performing an orthogonal trans form by again dividing into small blocks.
However, none of the documents refer to quantization performed by dividing a macroblock. [0034] Patent Document 1: Japanese Laid-open Patent Publication No. 2005-318468 [0035] Patent Document 2: Japanese Laid-open Patent Publication No. 2005-277968 [0036] Patent Document 3: Japanese Laid-open Patent Publication No. 2005-236584 [0037] Patent Document 4: Japanese Laid-open Patent Publication No. H8-79753