In recent years, a moving image encoding technique has been widely used. The moving image encoding technique has been used for a wide range of applications such as digital broadcasting, video content distribution via an optical disk, and video distribution via the Internet and the like. As techniques for generating encoded data by encoding a moving image signal at a low bit rate, a high compression ratio, and a high image quality and for decoding the encoded moving image, H.261 and H.263, which are standardized by the ITU (International Telecommunication Union), MPEG-1, MPEG-2, and MPEG-4, which are ISO (International Organization for Standardization) standards, VC-1, which is a SMPTE (Society of Motion Picture and Television Engineers) standard, and the like are used as the international standards.
In addition, H.264/MPEG-4 AVC (hereinafter referred to as “H.264”) which has been recently standardized by the ITU and ISO is known (Non Patent Literature 1). It is known that H.264 further improves the compression efficiency and image quality, as compared with moving image encoding techniques of related art.
To meet the demand for improving the video quality and reducing the transmission rate, the encoding techniques are complicated and the load of encoding processing increases. Accordingly, for example, real-time encoding of a full-high-vision (1920×1080 pixels) video in an H.264 system, which is the latest international standard system, cannot be achieved using only typical CPU software, and some accelerator is also used in practice. As a desirable accelerator in a platform of a PC (personal computer), a GPGPU (General Purpose Computing on Graphics Processing Unit) is known. In the GPGPU, a GPU (Graphics Processing Unit), which has been used for three-dimensional graphics processing, is used for other purposes as well. The GPGPU can perform processing matching the characteristics of the GPU, which exhibits an extremely high performance in large-scale vector operation, at a speed several to several tens of times faster than a CPU.
FIG. 8 shows a typical example of a moving image encoding apparatus of an H.264 system. As shown in the figure, a moving image encoding apparatus 100 includes a motion estimation unit 101, a motion compensation unit 102, an intra prediction mode determination unit 103, an intra prediction unit 104, a selection unit 105, an integer transform unit 106, a quantization unit 107, an inverse quantization unit 108, an inverse discrete integer transform unit 109, a variable-length coding unit 110, a deblocking filter unit 111, a frame buffer 112, a subtraction unit 113, and an addition unit 114. The moving image encoding apparatus 100 sequentially encodes each input image (hereinafter referred to as “input image”) to obtain a bit stream and outputs the obtained bit stream. In the moving image encoding apparatus 100, processing of all functional blocks is executed by a CPU.
In order to improve the compression efficiency and image quality, the H.264 system also employs the intra prediction (in-screen prediction) technique that performs a prediction using information on neighboring pixels within an image, and the technique of a deblocking filtering for reducing encoding noise caused in an image obtained as a result of encoding. The frame buffer 112 stores image data of previously encoded frames. Encoding processing is performed on the input image in the unit of a block of 16×16 pixels. The block is called a macroblock (MB).
The motion estimation (ME: Motion Estimation) unit 101 detects a change in the position of the corresponding image block between an input image and an encoded image stored in the frame buffer 112, and outputs motion vector information corresponding to the position change. The motion compensation (MC: Motion Compensation) unit 102 performs motion compensation processing using the encoded image stored in the frame buffer 112 and the motion vector information supplied from the motion estimation unit 101, and outputs a motion compensation prediction image.
The intra prediction mode determination unit 103 selects an appropriate intra prediction mode based on the input image and image information on the encoded macroblock within the input image, and outputs information (intra prediction mode information) indicating the selected mode. The intra prediction (IP: Intra Prediction) unit 104 performs intra prediction processing using the image information on the encoded macroblock within the input image and the intra prediction mode information supplied from the intra prediction mode determination unit 103, and outputs an intra prediction image.
The selection unit 105 selects an appropriate one of either the motion compensation prediction image, which is supplied from the motion compensation unit 102, or the intra prediction image, which is supplied from the intra prediction unit 104, and outputs the selected image as a predicted image. A mode for selecting the motion compensation prediction image is called an inter mode, and a mode for selecting the intra prediction image is called an intra mode.
The subtraction unit 113 subtracts the predicted image, which is output from the selection unit 105, from the input image, and outputs a prediction error image. The integer transform (DIT: Discrete Integer Transform) unit 106 performs orthogonal transform processing similar to that performed by DCT (Discrete Cosine Transform) on the prediction error image to obtain an orthogonal transform coefficient sequence, and outputs the obtained orthogonal transform coefficient sequence.
The quantization (Q: Quantize) unit 107 quantizes the orthogonal transform sequence from the integer transform unit 106, and outputs the quantized orthogonal transform coefficient sequence.
The variable-length coding (VLC: Variable-Length Coding) unit 110 encodes the quantized orthogonal transform coefficient sequence, which is supplied from the quantization unit 107, according to a predetermined rule, and outputs a bit stream of encoding results. This bit stream is an output bit stream of the encoding apparatus of the H.264 system.
The orthogonal transform coefficient sequence quantized by the quantization unit 107 is also output to the inverse quantization (IQ: Inverse Quantization) unit 108, and is subjected to inverse quantization processing by the inverse quantization unit 108 and then subjected to inverse discrete integer transform processing by the inverse discrete integer transform (IDIT: Inverse Discrete Integer Transform) unit 109. Then, the orthogonal transform coefficient sequence is added to the predicted image, which is output from the selection unit 105, by the addition unit 114, and is further subjected to deblocking filtering processing by the deblocking filter unit 111. Data obtained by the deblocking filter unit 111 is a local decoded image to be stored in the frame buffer 112 and used for encoding of the subsequent frame.
The intra prediction mode determination unit 103 and the selection unit 105 employ various selection methods. However, in general, the intra prediction mode determination unit 103 and the selection unit 105 select one having a higher encoding efficiency.
The contents of the above-mentioned processing of the functional blocks of the moving image encoding apparatus 100 are also disclosed in Non Patent Literature 2, for example, so a detailed description thereof is omitted.
In each functional block of the moving image encoding apparatus 100 shown in FIG. 8, in general, the throughputs for the motion estimation performed by the motion estimation unit 101 and the intra prediction mode determination performed by the intra prediction mode determination unit 103 are especially high. Accordingly, the processing of the motion estimation and intra prediction mode determination is off-loaded to an accelerator, such as a GPU, thereby achieving speeding-up of the processing. The case where the motion estimation and intra prediction mode determination are off-loaded to the GPU will be described with reference to FIG. 9.
FIG. 9 shows an example of the configuration of the moving image encoding apparatus in which the respective processings of the motion estimation unit 101, the intra prediction mode determination unit 103, and the motion compensation unit 102, which uses the results of the motion estimation performed by the motion estimation unit 101, as shown in FIG. 8, are off-loaded to the GPU and the subsequent respective processings are executed by the CPU. To facilitate comparison with FIG. 8, functional blocks having the same function are denoted by the same reference numerals in FIGS. 9 and 8.
As shown in FIG. 9, in a moving image encoding apparatus 200, the GPU executes the respective processings of the motion estimation unit 101, an intra prediction mode determination unit 203, and the motion compensation unit 102. Other respective processings are executed by the CPU.
In the moving image encoding apparatus 200, the intra prediction mode determination unit 203 that performs an intra prediction mode determination is different from the intra prediction mode determination unit 103 of the moving image encoding apparatus 100. The reason for this will be described below.
Since it generally takes a long time to perform data communication between a CPU and a GPU, the processing results can be collectively transferred from the CPU to the GPU by a certain amount, for example, by an amount corresponding to one screen. Specifically, the GPU performs motion estimation, motion compensation, and intra prediction mode determination processing for one screen, and collectively transfers the processing results for one screen to the CPU. The CPU performs the subsequent processing for the one screen. In this case, the intra prediction mode determination unit 203 cannot use the image information on the encoded macroblocks within the same image. Accordingly, unlike the intra prediction mode determination unit 103 of the moving image encoding apparatus 100, the intra prediction mode determination unit 203 operates to select an appropriate intra prediction mode by using only the information on the input image.
In the moving image encoding apparatus 200, the intra prediction mode determination processing is performed by the GPU, but the intra prediction processing using the result is executed by the CPU as in the moving image encoding apparatus 100. This is because the result of DIT-Q-IQ-IDIT processing on an image block adjacent to the image block (having a size of 16×16 pixels, 8×8 pixels, or 4×4 pixels) which is being processed is required for intra prediction.
In many cases, patterns that are spatially analogous to each other are continuously formed in normal images. For this reason, in the H.264 intra prediction, the image data of the block adjacent to the block to be processed is duplicated to predict the image of the block to be processed, thereby obtaining a high prediction effect. To deal with various types of patterns, prediction modes in nine directions as shown in FIG. 10 are used in the case of 4×4 blocks, for example. The intra prediction mode determination is processing for determining a mode indicating an optimum prediction result from among the nine modes. The prediction results of the nine modes are evaluated in each block of an image and the optimum mode is selected, which results in an increase in throughput. In this case, images located outside the screen cannot be used as a prediction source, so the operation of intra prediction is changed at an end of the screen and at a boundary of division when the screen is divided. At an upper end of the screen, for example, the modes 0, 3, 4, 5, 6, and 7, in which the upper-side image is used, cannot be selected. When the mode 2 is selected, a special operation is carried out.
As disclosed in Patent Literatures 1 and 2, for example, H.264 enables division of a screen into small regions, each of which is called a slice, thereby encoding each slice separately. In this case, images located outside each slice cannot be used as a prediction source, so the operation of intra prediction is also changed at the boundary between slices in the same manner as described above.
Processing for dividing a screen into slices to be encoded will be described in detail with reference to a moving image encoding apparatus 300 of related art shown in FIG. 11. For ease of understanding, in FIG. 11, the processing configuration is limited to that when the intra mode, i.e., the intra prediction image, is selected. The subtraction unit 113, the integer transform unit 106, the quantization unit 107, the inverse quantization unit 108, the inverse discrete integer transform unit 109, the addition unit 114, and the variable-length coding unit 110, which are provided in the moving image encoding apparatus 100 shown in FIG. 8, are integrated into a block encoding unit 303. A slice division structure control unit 301 is added to explain the operation using a slice.
As shown in FIG. 11, an intra prediction mode determination unit 310 includes an optimum mode determination unit 320. The optimum mode determination unit 320 determines an appropriate intra prediction mode by using information on an input image and information on the slice division structure of the screen supplied from the slice division structure control unit 301, and outputs intra prediction mode information.
An intra prediction unit 302 performs intra prediction processing using the intra prediction mode information supplied from the intra prediction mode determination unit 310, information on the slice division structure of the screen supplied from the slice division structure control unit 301, and information on an encoded image supplied from an encoded image storage unit 304, and outputs an intra prediction image.
The block encoding unit 303 performs a series of encoding processing, such as DIT-Q-IQ-IDIT, by using the input image and the intra prediction image supplied from the intra prediction unit 302, and outputs a bit stream and an encoded image. The encoded image storage unit 304 stores the encoded image supplied from the block encoding unit 303.
The slice division structure control unit 301 determines a slice division position by using the bit stream output from the block encoding unit 303, and outputs information on the slice division structure of the screen.
The encoding by slice division is effective for reducing the effect of transmission line errors when video communication is performed using a transmission line in which an error occurs. A variable-length code is used as a bit stream in H.264 and the like. Accordingly, if a bit error occurs due to a transmission line error or the like, bit streams following the position where the bit error occurs cannot be normally decoded, so that the effect of the bit error propagates through the subsequent region of the screen. FIG. 12 is a diagram for explaining the range affected by the error.
The left side of FIG. 12 shows the range affected by the error when slice division is not performed. As shown in the figure, in this case, the effect of the error covers the whole region below the position where the error occurs. On the other hand, as shown on the right side of FIG. 12, when slice division is performed, the effect of the error is limited to the inside of the slice in which the error occurs.
In the case of using an IP (Internet Protocol) network as a transmission line, slice division is generally performed so that the data size of one slice falls within Path MTU (which is a maximum data size that can be transmitted by one packet). This is because if the slice is present across a plurality of packets, the error rate due to a packet loss increases. Therefore, in the case of encoding a moving image, dynamic slice division processing is performed in which the data size of a bit stream obtained as a result of encoding is monitored and when the size of data included in one slice exceeds a predetermined value, the slice is further divided. The slice division structure control unit 301 of the moving image encoding apparatus 300 performs this dynamic slice division processing.