In recent years, video encoding technology has become essential technology due to an increase in video streaming content with the development of broadband networks, the use of large-screen video display equipment and high-capacity storage media such as a DVD, or the like. Further, along with a high resolution image pickup device or a high resolution display device, a technique for encoding at high resolution in the moving picture encoding technology has become essential.
An encoding process is a process of converting an original image that is inputted to a video encoding device into a stream with less amount of data. However, as one of video encoding techniques capable of achieving encoding with high image quality and high resolution, there is H.264/AVC (Advanced Video Coding) that is an international standard. In the H.264/AVC encoding scheme, encoding is performed using prediction techniques such as intra-frame prediction and inter-frame prediction. Further, in the H.264/AVC encoding process, generally, processing is performed on the basis of a macroblock (hereinafter, appropriately referred to as “MB”) consisting of 16×16 pixels for the original image.
As a prediction scheme used in H.264/AVC encoding, mainly, there are two prediction schemes of intra-frame prediction and inter-frame prediction. In the intra-frame prediction, there are provided a plurality of prediction schemes in accordance with the size of a block serving as a unit of prediction or to a combination of prediction directions. Also in the inter-frame prediction, there are provided a plurality of prediction schemes in accordance with the size of a block serving as a unit of prediction. In H.264/AVC, the prediction scheme is selected dynamically in accordance with the code amount or target image quality, thereby realizing an encoding scheme with high image quality and high compression.
Hereinafter, an outline of H.264/AVC encoding will be described with reference to FIG. 18. FIG. 18 is a diagram showing a configuration of a conventional video encoding device for performing a H.264/AVC encoding process.
In an encoding process using intra-frame prediction, a mode selection unit 930 selects an intra-frame prediction unit 910. Then, a stream 91 is obtained from an original image 90 through the intra-frame prediction unit 910, an orthogonal transformation unit 940, a quantization unit 950 and a variable length encoding unit 980. Further, in an encoding process using inter-frame prediction, the mode selection unit 930 selects an inter-frame prediction unit 920. Then, the stream 91 is obtained from the original image 90 through the inter-frame prediction unit 920, the orthogonal transformation unit 940, the quantization unit 950 and the variable length encoding unit 980.
The original image 90 and a reconstructed image 92 are inputted to the intra-frame prediction unit 910. The reconstructed image 92 is an image configured by combining a restored difference image 97 outputted from an inverse orthogonal transformation unit 970 and a prediction image 95 outputted from the mode selection unit 930.
Further, an appropriate intra-frame prediction mode is selected from the original image 90 and the reconstructed image 92 by an intra-frame prediction process to generate intra-frame prediction information D81 representing mode information of the intra-frame prediction mode, an intra-frame prediction image 93 that is a prediction result, and an intra-frame prediction error D82 representing a difference between the original image 90 and the intra-frame prediction image 93. Further, the intra-frame prediction information D81 includes intra-frame prediction mode information representing the direction of the intra-frame prediction, and an intra-frame prediction block type representing the block size when the intra-frame prediction is performed.
The inter-frame prediction unit 920 receives the original image 90 and the reconstructed image 92 generated from an original image before or after the original image 90 (in the past or future), and generates inter-frame prediction information D83, an inter-frame prediction image 94, and an inter-frame prediction error D84 representing a difference between the original image 90 and the inter-frame prediction image 94. The inter-frame prediction information D83 includes motion vector information as a result of performing motion compensation, and an inter-frame prediction block type representing the block size when the inter-frame prediction is performed.
An encoding controller 990 determines an encoding mode of one of intra-frame prediction and inter-frame prediction in accordance with an encoding mode selection algorithm based on the intra-frame prediction error D82 inputted from the intra-frame prediction unit 910, the inter-frame prediction error D84 inputted from the inter-frame prediction unit 920, and code amount information D86 (which will be described later) inputted from the variable length encoding unit 980. Then, the encoding controller 990 outputs, to the mode selection unit 930, encoding mode selection information D87 indicating the determined encoding mode. Also the encoding controller 990 determines a quantization coefficient D88 in accordance with a rate control algorithm, and outputs the quantization coefficient D88 to the quantization unit 950.
Since the encoding mode selection algorithm and the rate control algorithm have a great influence on the code amount of the stream 91 and image quality, there are various ways depending on the content of the original image 90 to be encoded or the application of video coding.
In accordance with the encoding mode selection information D87 inputted from the encoding controller 990, the mode selection unit 930 outputs the intra-frame prediction image 93 as a prediction image 95 if the intra-frame prediction unit 910 has been selected, and outputs the inter-frame prediction image 94 as the prediction image 95 if the inter-frame prediction unit 920 has been selected.
The orthogonal transformation unit 940 generates frequency components D89 from a difference image 96 corresponding to a difference between the original image 90 and the prediction image 95 by using an orthogonal transform process.
The quantization unit 950 performs a quantization process on the frequency components D89 inputted from the orthogonal transformation unit 940 based on the quantization coefficient D88 inputted from the encoding controller 990, and outputs quantization values D90 with a reduced amount of information.
An inverse quantization unit 960 performs an inverse quantization process on the quantization values D90 to generate restored frequency components D91.
The inverse orthogonal transformation unit 970 performs an inverse orthogonal transform process on the restored frequency components D91 to generate the restored difference image 97. Then, the generated restored difference image 97 and the prediction image 95 outputted from the mode selection unit 930 may be combined and stored as the reconstructed image 92.
The variable length encoding unit 980 encodes the quantization values D90 and the intra-frame prediction information D81 or inter-frame prediction information D83 into a data string having a smaller amount of data, and outputs the data string as a stream 91. Also, the variable length encoding unit 980 outputs the code amount information D86 to the encoding controller 990. The code amount information D86 indicates the code amount of the stream 11 after variable length encoding.
In the conventional video encoding device shown in FIG. 18, as the original image 90 to be inputted, there are two types, i.e., an interlaced image and a progressive image. FIGS. 19A and 19B are diagrams illustrating an interlaced image and a progressive image.
As shown in FIG. 19A, the interlaced image is configured by extracting ½ of the progressive image in a vertical direction, and the whole image is obtained by alternately arranging a top field obtained by extracting only odd-numbered lines and a bottom field obtained by extracting only even-numbered lines from the top of the screen.
On the other hand, as shown in FIG. 19B, the progressive image is a full-size image that is not subjected to an extraction process.
However, in H.264/AVC, when the original image 90 is the interlaced image, there is provided an encoding tool called adaptive field/frame coding (hereinafter, referred to as “AFF”) which is capable of improving the encoding efficiency (see, e.g., Patent Document 1).
FIG. 20 is a diagram illustrating adaptive field/frame coding (AFF) of conventional H.264/AVC. In the case where the input image is the interlaced image, the AFF is an encoding process method in which encoding is performed while switching, on a frame-by-frame basis, between frame coding for encoding the top field and bottom field as one frame as shown in (a) of FIG. 20, and field coding for encoding the top field and bottom field as separate pictures as shown in (b) of FIG. 20.
For example, if there is no change in the brightness or if the motion of the image is small, i.e., if the difference is small between the top field and the bottom field, the pixel density in the vertical direction in frame coding becomes two times as large as that in field coding, and thus pixel correlation in the image increases in frame coding. Accordingly, it can be expected that the accuracy of the intra-frame prediction is improved, and the encoding efficiency is improved. Also in the inter-frame prediction, in frame coding compared to field coding, prediction efficiency in a pixel block is improved due to an increase in the pixel density. Accordingly, a larger prediction block can be easily selected, and thus the encoding efficiency may be improved.
On the other hand, if a change in brightness or motion in the image is large, and a change occurs between images of the top field and bottom field, pixel correlation in the image decreases in frame coding. Thus, in both of the intra-frame prediction and the inter-frame prediction, the encoding efficiency is lowered when performing frame coding than performing field coding.
Therefore, in the case of using the AFF of H.264/AVC, in order to improve the encoding efficiency, it is important to appropriately perform switching between field coding and frame coding depending on the status of the image.
Further, FIG. 21 is a diagram showing an outline of an encoding mode determination method of the conventional AFF. In reference software JM in conventional H.264/AVC, a multi-pass technique has been used to determine the encoding mode of the AFF.
Specifically, the original image 90 is encoded by both a video encoding unit 810 for frame coding and a video encoding unit 820 for field coding. Then, an output stream of either one is selected by an AFF mode selection unit 830, and it is outputted finally as the stream 91.
Further, in the AFF, there are a method of performing switching between field coding and frame coding for each picture, and a method of performing switching between field coding and frame coding for each macroblock. The former is called picture adaptive field/frame coding (PAFF), and the latter is called macroblock adaptive field/frame coding (MBAFF).
FIG. 22 is a diagram showing an outline of an encoding mode determination method in conventional PAFF and MBAFF. In the H.264/AVC standard, the PAFF and MBAFF may be nested in each other, and the MBAFF can be used only when frame coding is selected in the PAFF. In the case of using the PAFF, two units, i.e., a video encoding unit 860 and a video encoding unit 870 are used for frames so that frame coding is performed by the video encoding unit 860 and field coding is performed by the video encoding unit 870. Then, a PAFF mode selection unit 880 determines which one of AFF modes is advantageous in encoding efficiency based on the encoding results for each frame, and selects the encoding result with higher efficiency.
If frame coding is selected in the PAFF, the MBAFF can be also used. Similarly, in the case of using the MBAFF, two units, i.e., video encoding units 861 and 862 are used so that frame coding is performed by one video encoding unit 861 and field coding is performed by the other video encoding unit 862. Then, a MBAFF mode selection unit 863 determines which one of AFF modes is advantageous in encoding efficiency from the encoding results for each super MB of 16×32 pixels (the number of horizontal pixels×the number of vertical pixels, the same applies to the following), and selects the encoding result with higher efficiency.
Then, each of the MBAFF mode selection unit 863 and the PAFF mode selection unit 880 selects an output stream of one of modes in each frame for each super MB of 16×32 pixels, and finally outputs the selected output stream as the stream 91.
[Patent Document 1] Japanese Patent Application Publication No. 2008-283595
However, in the conventional encoding method, it is necessary to perform different types of encoding processes, i.e., frame coding and field coding, multiple times on a frame basis or on a pixel block basis of 16×32 pixels for one input image. Accordingly, in the case of using the AFF, the processing amount may increase two or more times as compared to when the AFF is not used.