A moving image compression encoding technique is widely spread. A moving image compression encoding technique is widely used for various purposes such as digital broadcasting, video content distribution by an optical disc, and video distribution via the Internet or the like.
As a technique of generating encoded data by encoding a moving image signal at a low bit rate, at a high compression rate, and with a high image quality, and decoding an encoded moving image, for example, there are H.261 and H.263 standardized by the International Telecommunication Union (ITU). There are also Moving Picture Export Group (MPEG)-1, MPEG-2, and MPEG-4 by the International Organization for Standardization (ISO), and VC-1 by the Society of Motion Picture and Television Engineers (SMPTE). H.261, H.263, MPEG-1, MPEG-2, MPEG-4, and VC-1 are widely used as a technique in accordance with the international standards.
H.264/MPEG-4 Advanced Video Coding (AVC) (hereinafter, referred to as H.264), which is described in NPL1 and standardized by the ITU and the ISO in cooperation with each other, is also spread. H.265/MPEG-H High Efficiency Video Coding (HEVC) (hereinafter, referred to as H.265) described in NPL 2 is standardized as a latest moving image compression encoding standard in the year 2013. H.265 is able to compress a data size to about a half, as compared with H.264, while keeping substantially the same video quality as H.264. H.265 has a possibility of being widely used in all fields.
The aforementioned moving image encoding technique is configured by combining a plurality of element techniques such as motion compensation prediction, orthogonal transformation of a prediction error image, quantization of an orthogonal transformation coefficient, and entropy encoding of a quantized orthogonal transformation coefficient. Encoding in a moving image encoding technique is referred to as hybrid encoding.
The aforementioned moving image encoding technique increases a compression rate of a moving image by performing inter-frame prediction using a level of image correlation in a time axis direction, which is one of characteristics of a moving image. In inter-frame prediction, generally, a motion compensation prediction technique of generating a prediction image by correcting a motion and a positional deviation of a subject, a background, and the like between images which are temporally close to each other.
FIG. 11 illustrates one example of a configuration of a moving image encoding device in which inter-frame prediction is used as a prediction method. FIG. 11 is a block diagram illustrating a configuration example of a general moving image encoding device.
A moving image encoding device 100 illustrated in FIG. 11 includes a subtraction unit 101, a transformation unit 102, a quantization unit 103, an inverse quantization unit 104, an inverse transformation unit 105, an adder unit 106, a filter unit 107, a frame buffer 108, a scan unit 109, an entropy encoding unit 110, a rate control unit 111, and a motion prediction unit 112.
When a new image is input to the moving image encoding device 100, the moving image encoding device 100 executes encoding processing separately on each image block of a predetermined size.
For example, when H.264 is used as a moving image compression encoding method, the moving image encoding device 100 executes encoding processing separately on each block of 16×16 pixels, which is referred to as a macroblock (MB). When H.265 is used as a moving image compression encoding method, the moving image encoding device 100 executes encoding processing on each block of 16×16 pixels, 32×32 pixels, 64×64 pixels, or the like, which is referred to as a coding tree unit (CTU).
When a new image is input, the motion prediction unit 112 detects a position change of an image block of an input image, which corresponds to an image block of an encoded image stored in the frame buffer 108. The frame buffer 108 stores image data of an already encoded frame.
The motion prediction unit 112 finds out motion vector information corresponding to the detected position change. The motion prediction unit 112 executes motion compensation prediction processing on the basis of the motion vector information that is found out, and outputs a motion compensation prediction image.
The motion compensation prediction image is input from the motion prediction unit 112 into the subtraction unit 101. Next, the subtraction unit 101 subtracts the motion compensation prediction image from the input image. The subtraction unit 101 sets, as a prediction error image, an image obtained by subtracting the motion compensation prediction image, and outputs the prediction error image.
The prediction error image is input from the subtraction unit 101 into the transformation unit 102. The transformation unit 102 executes orthogonal transformation processing equivalent to Discrete Cosine Transform (DCT) on the input prediction error image. The transformation unit 102 generates a transformation coefficient sequence by executing orthogonal transformation processing, and outputs the generated transformation coefficient sequence.
For example, when H.264 is used as a moving image compression encoding method, orthogonal transformation processing is executed independently on each block of 4×4 pixels or 8×8 pixels. When H.265 is used as a moving image compression encoding method, orthogonal transformation processing is executed independently on each block of 4×4 pixels, 8×8 pixels, 16×16 pixels, or 32×32 pixels.
The transformation unit 102 may transform a prediction error image by executing orthogonal transformation processing equivalent to DCT or another transformation processing such as a Wavelet transformation. The moving image encoding device 100 may not necessarily include a transformation unit that executes transformation processing on a prediction error image.
The transformation coefficient sequence is input from the transformation unit 102 into the quantization unit 103, and a quantization parameter (QP) is input from the rate control unit 111 into the quantization unit 103. The quantization unit 103 executes quantization processing on the input transformation coefficient sequence on the basis of the input quantization parameter, and generates a quantized transformation coefficient sequence. The quantization unit 103 outputs the generated quantized transformation coefficient sequence.
The quantized transformation coefficient sequence is input from the quantization unit 103 into the inverse quantization unit 104. The inverse quantization unit 104 executes inverse quantization processing on the input quantized transformation coefficient sequence, and generates a transformation coefficient sequence. The inverse quantization unit 104 outputs the generated transformation coefficient sequence.
The transformation coefficient sequence is input from the inverse quantization unit 104 into the inverse transformation unit 105. The inverse transformation unit 105 executes inverse integer transformation processing on the input transformation coefficient sequence, and generates a prediction error image. The inverse transformation unit 105 outputs the generated prediction error image.
The prediction error image is input from the inverse transformation unit 105 into the adder unit 106, and a prediction image is input from the motion prediction unit 112 into the adder unit 106. The adder unit 106 adds the input prediction error image and the input prediction image, and outputs an image generated by adding.
The image obtained by adding the prediction error image and the prediction image is input from the adder unit 106 into the filter unit 107. The filter unit 107 generates a local decoded image by executing filter processing on the input image.
Filter processing executed by the filter unit 107 is processing of reducing distortion arising in an image because of encoding. For example, when H.264 and H.265 are used as a moving image compression encoding method, a deblocking filter is used in filter processing. When H.265 is used as a moving image compression encoding method, Sample Adaptive Offset is also used in filter processing.
The local decoded image generated by the filter unit 107 is stored in the frame buffer 108. A local decoded image is used in encoding a succeeding frame.
The quantized transformation coefficient sequence is input from the quantization unit 103 into the scan unit 109. The scan unit 109 executes predetermined scan processing on the input quantized transformation coefficient sequence, and rearranges the quantized transformation coefficient sequence. The predetermined scan processing is, for example, zigzag scanning. The scan unit 109 outputs a rearranged transformation coefficient sequence.
The rearranged transformation coefficient sequence is input from the scan unit 109 into the entropy encoding unit 110. The entropy encoding unit 110 executes entropy encoding processing on the input rearranged transformation coefficient sequence in accordance with a predetermined rule, and generates a bit stream. The entropy encoding unit 110 outputs the generated bit stream.
For example, when H.264 and H.265 are used as a moving image compression encoding method, Context Adaptive Binary Arithmetic Coding (CABAC) is used in entropy encoding processing. When H.264 is used as a moving image compression encoding method, Context-based Adaptive VideoLAN Client (CAVLC) may be used in entropy encoding processing.
The bit stream is input from the entropy encoding unit 110 into the rate control unit 111. The rate control unit 111 calculates a quantization parameter used in quantization of a succeeding block on the basis of statistical information of the input bit stream.
The statistical information, used by the rate control unit 111, of the bit stream is, for example, a code amount per block, or a code amount per context. A specific processing content of the quantization parameter calculation processing is described in, for example. NPL 3 or NPL 4.
For example, it is often the case that a bit rate of a bit stream to be output is designated, and the moving image encoding device 100 is required to controll a rate in such a manner that the rate coincides with the designated bit rate. The rate control unit 111 executes rate control processing of controlling a rate in such a manner that the rate coincides with the designated bit rate.
For example, a specific content of rate control processing in H.265 is described in NPL 4. In the rate control processing described in NPL 4, on the basis of a degree of importance or the like of each of the image frames, a target bit number (TCurrPic) of each of the frames is calculated. In frame encoding processing, feedback control is executed in such a manner that a bit number of a frame coincides with TCurrPic.
Specifically, when a predetermined CTU is encoded, the number of bits generated in an already encoded CTU group is subtracted from TCurrPic. The remaining bit number of TCurrPic after subtraction is distributed in an uncoded CTU group. A quantization parameter to be used in quantization processing of a CTU is calculated on the basis of bit numbers each assigned to CTUs.
As described above, in a moving image encoding device, a series of image processing and signal processing, i.e., motion prediction processing, transformation processing, quantization processing, inverse quantization processing, inverse transformation processing, and filter processing, are executed for multiple images. In other words, generally, an enormous amount of computation is required for execution of moving image encoding processing.
Therefore, when moving image encoding processing is executed by causing a general-purpose processor to control software, a processor operative at a high operating frequency is required. Using a processor operative at a high operating frequency causes a problem of increase in cost and electric power consumption.
It is often the case that a processor operative at a high operating frequency does not have computing power capable of processing an input moving image at the time of input thereof. Therefore, when a processor operative at a high operating frequency processes a moving image having high resolution at the time of input thereof, there occurs a problem that processing time gets longer.
Even when encoding processing is executed by dedicated hardware, a processing circuit operative at a high speed is required, and consequently, there occurs a problem of increase in cost and electric power consumption. Using a processing circuit operative at a high speed is highly likely to cause hardware designing to become complicated, and a development period to be extended.
In order to solve the aforementioned problems, a technique has been developed in which image processing and signal processing are executed at a high efficiency and at a high speed by using, in addition to a general-purpose processor, another processor, a processing circuit, or the like having a configuration appropriate for moving image encoding processing as an accelerator.
As an accelerator, for example, a Graphics Processing Unit (GPU) is used. A Digital Signal Processor (DSP) or a Field Programmable Gate Array (FPGA) may also be used as an accelerator.
A GPU is a processor used normally in three-dimensional graphics processing. However, a GPU has a configuration of a large-scale parallel processor, in which several hundreds to several thousands of processor cores are integrated. Therefore, as far as processing is appropriate for characteristics of a GPU, the GPU is able to execute the processing at a speed several times to several ten times as fast as a general-purpose processor.
Generally, a computation amount relating to motion vector search processing, which is executed by the motion prediction unit 112 of the moving image encoding device 100 illustrated in FIG. 11, is large. Therefore, when the motion prediction unit 112 is implemented by a GPU or the like, motion vector search processing is executed at a high speed, as compared with when the motion prediction unit 112 is implemented by a CPU.
A computation amount relating to processing such as transformation processing, quantization processing, and scan processing is also large. When more sophisticated and complicated quantization processing including adaptive QP selection processing, RD optimal quantization processing, or the like described in NPL 4 is executed as quantization processing, for example, a computation amount relating to quantization processing further increases.
PTL 1 describes an efficient technique of executing, in parallel by using a GPU, transformation processing, quantization processing, and scan processing that require large computation amounts for execution.
An information processing device described in PTL 1 executes transformation processing, quantization processing, and scan processing in parallel by a GPU. Next, the GPU inputs intermediate data that is a processing result of parallel processing into a CPU that is a general-purpose processor. The CPU executes lossless compression processing on the intermediate data. The lossless compression processing corresponds to entropy encoding processing executed by the entropy encoding unit 110 illustrated in FIG. 11.
As described above, the information processing device described in PTL 1 assigns processing to be executed to a GPU and a CPU in a distributive manner. Specifically, transformation processing, quantization processing, and scan processing, which are predetermined computation capable of being executed for each block in parallel, are assigned to a GPU which has high capability for parallel computation. Lossless compression processing, in which a data size after compression fluctuates, and which is not easy to be executed in parallel, is assigned to a CPU which has high capability for complicated bit analysis or the like. When processing is assigned suitable for each characteristic of a GPU and a CPU, efficiency of executing processing of a whole device is enhanced.