The present invention relates to a video decoding processing apparatus and an operating method thereof, and particularly to a technology effective in reducing overhead for the start of parallel decoding processing.
As has been well known, a general compression system of a moving picture based on the standard of MPEG-2 standardized by the international standard ISO/IEC 13818-2 has been based on the principle that a video storage capacity and a required bandwidth are reduce by reducing redundant information from a video stream. Incidentally, MPEG is an abbreviation of Moving Picture Experts Group.
Since the MPEG-2 standard defines only a syntax (a rule for compressed and encoded data sequence or a construction method of a bit stream of encoded data) of a bit stream, and a decode process, it is one flexible in such a manner as to be sufficiently available under various kinds of circumstances such as a satellite broadcasting/service, a cable television, an interactive television, Internet, etc.
In an encode process of MPEG-2, a video signal is first sampled and quantized to define components of a color difference of each pixel and brightness thereof in digital video. Values indicative of the components of the color difference and brightness are stored in a structure known as a macro block. The values thereof stored in the macro block are transformed to frequency values using discrete Cosine Transform (DCT). The transform coefficients obtained by DCT have frequencies different according to the brightness and color difference of each picture. The quantized DCT transform coefficients are encoded by variable length coding (VLC) for further compressing a video stream.
Additive compression based on a motion compression technique has been defined in the encode process of MPEG-2. Pictures or frames of three kinds of an I frame, a P frame and a B frame exist in the MPEG-2 standard. The I frame is an intra-coded frame meaning that it is reproduced without referring to any other pictures or frames in the video stream. The P and B frames are inter-coded frames meaning that they are reproduced referring to other pictures or frames. For example, the P and B frames include motion vectors indicative of motion estimation with respect to a reference frame. With the use of the motion vectors, it is possible for an MPEG encoder to reduce a bandwidth necessary for a specific video stream. Incidentally, the I frame is called an intra-coded frame, the P frame is called a predictive-coded frame, and the B frame is called a bi-directional predictive-coded frame, respectively.
Accordingly, an MPEG-2 encoder is comprised of a frame memory, a motion vector detector, a motion compensator, a subtractor, a DCT transformer, a quantizer, an inverse quantizer, an inverse DCT transformer, a variable length encoder, and an adder. An encoded video signal is stored in the frame memory to perform encoding of the P and B frames and detect motion vectors. Thereafter, the encoded video signal is read from the frame memory, and a motion compensation prediction signal from the motion compensator is subtracted by the subtractor. DCT transformation processing and quantization processing are performed thereon by the DCT transformer and the quantizer respectively. A quantized DCT transform coefficient is subjected to variable length coding processing by the variable length encoder and then subjected to local decoding processing by the inverse quantizer and the DCT transformer. Afterwards, the result of this local decoding processing is directly supplied to the adder and supplied to the subtractor via the motion compensator.
On the other hand, an MPEG-2 decoder is comprised of a buffer memory, a variable length decoder, an inverse quantizer, an inverse DCT transformer, a motion compensator, an adder, and a frame memory. A coding bit stream based on MPEG-2 is stored in the buffer memory and thereafter subjected to variable length decoding processing, inverse quantization processing, and inverse DCT transformation processing by the variable length decoder, the inverse quantizer and the inverse DCT transformer respectively. A prediction image obtained by the motion compensator from the motion vectors subjected to the variable length decoding processing, and the result of the inverse DCT transformation processing are added thereto by the adder. A reproduced image signal is generated from the output of the adder. The reproduced image signal is stored in the frame memory and used for prediction of another frame.
There has been proposed a moving picture or video compressing system based on the standard (H. 263) of MPEG-4 following the MPEG-2 standard, standardized by the internal standard ISO/IEC 14496 for low-rate encoding of a television telephone or the like. The compression system based on the MPEG-4 (H. 263) standard is called a “hybrid type” using an inter-frame prediction and discrete cosine transform as with the MPEG-2. Further, motion compensation in a half pel unit has been introduced therein. This compression system has been improved in compression ratio by introducing a technique called three-dimensional variable length coding (3-D VLC) that uses the Huffman code used as entropy coding, but newly encodes run/level/last simultaneously, like the M-PEG2. Incidentally, the run and level relate to a coefficient of a run length, and the last indicates whether or not it is the last coefficient. Further, the MPEG-4 (H. 263) standard includes a basic portion called a Baseline and an extended standard called Annex.
In order to bring the compression system based on the MPEG-4 (H. 263) to higher encoding efficiency, the standard of MPEG-4 AVC (H. 264) has been standardized by the international standard ISO/IEC 14496-10. Incidentally, AVC is an abbreviation of Advanced Video Coding. The MPEG-4 AVC (H. 264) standard is called H. 264/AVC standard.
Video coding based on the H. 264/AVC standard is comprised of a video coding layer and a network abstraction layer. That is, the video coding layer is designed to effectively represent a video context. The network abstraction layer serves to format a video VCL representation and give header information by a method suitable for transfer by various transfer layers or storage media.
In the international standard video encoding method based on MPEG-2, MPEG-4, H. 264/AVC standard or the like, inter-frame prediction coding has been used to realize high encoding efficiency utilizing correlation in the time direction. As for frame encoding modes, there are an I frame encoded without using the correlation between frames, a P frame predicted from one frame encoded in the past, and a B frame capable of prediction from two frames encoded in the past.
In the inter-frame prediction coding, a reference image (prediction image) subjected to motion compensation is subtracted from a moving picture, and a predictive residual from this subtraction is encoded. Processing for the encoding includes processing of orthogonal transformation such as DCT (Discrete Cosine Transform), quantization and variable length coding. The motion compensation (motion correction) includes processing of spatially moving a reference frame in inter-frame prediction. The processing of the motion compensation is carried out in block units of frames to be encoded. When there is no motion in image contents, no motion is done and a pixel in the same position as a pixel to be predicted is used. When the motion exists, the most suitable block is searched, and a movement amount is taken as a motion vector. A motion compensation block is a block of 16 pixels×16 pixels/16 pixels×8 pixels in a coding method based on the MPEG-2. In a coding method based on the MPEG-4, it is a block of 16 pixels×16 pixels/16 pixels×8 pixels/8 pixels×8 pixels. In a coding method based on the H. 264/AVC standard, the motion compensation block is a block of 16 pixels×16 pixels/16 pixels×8 pixels/8 pixels×16 pixels/8 pixels×8 pixels/8 pixels×4 pixels/4 pixels×8 pixels/4 pixels×4 pixels.
The above-described coding processing is performed for every video screen (frame or field). Each of fractionated blocks (normally 16 pixels×16 pixels, and called macro blocks (MB) in MPEG) of the screen serves as a processing unit. That is, the most similar block (prediction image) is selected from the reference images already encoded every block to be encoded, and a differential signal between the encoded image (block) and the prediction image is encoded (orthogonal transformation, quantization or the like). The difference in relative position between the encoded block in the screen and the prediction signal is called a motion vector.
Further, a video coding layer (VCL) based on the H. 264/AVC standard has been described in the following Patent Document 1 as being in accordance with an approach called block based hybrid video coding. A VCL design is comprised of a macro block, a slice and a slice block. Each picture is divided into a plurality of macro blocks each having a fixed size. Each macro block includes square picture regions of 16×16 samples as brightness components, and square sample regions respectively related to two color difference components corresponding thereto. One picture can include one or more slices. Each slice is self-inclusive in that it provides an active sequence and a picture parameter set. Since a slice representation can basically be decoded without using information from other slices, a syntax element can be analyzed from a bit stream and the value of a sample of a picture region. However, in order to adapt a deblocking filter over a slice boundary for more complete decoding, several information from other slices are required. Further, since each slice is encoded and decoded independently from other slices of a picture, the ability to use the slice in parallel processing has also been described in the following Non-Patent Document 1.
On the other hand, a system that handles moving picture codes has increased in screen size with respect to a digital HDTV (High Definition Television) broadcasting receiver, a digital video camera capable of photographing or capturing HDTV signals, etc. High processing performance has increasingly been demanded for a video encoder and a view decoder that handle these signals.
From such a background, there has been proposed a new standard H. 265 (ISO/IEC 23008-2) that is a standard following the H. 264/AVC standard. This new standard is also called HEVC (High Efficiency Video Coding). This standard is excellent in compression efficiency due to appropriation of a block size, etc. and has compression performance equivalent to about four times as much as the MPEG-2 standard and approximately twice as much as the standard H. 264/AVC.
On the other hand, the following Patent Document 1 has described that in widely-adopted various coding compression standards such as MPEG-1/2/4, H. 261/H. 263/H. 264-AVC, etc., one macro block comprised of 16×16 pixels has been used as a processing unit for motion compensation and subsequent processing, whereas in the H. 265/HEVC standard, a more flexible block structure has been adopted as a processing unit. The unit of the flexible block structure is called a coding unit (CU) and is adaptively divided into small blocks using a quadtree to achieve satisfactory performance starting with the largest coding unit (LCU). The size of the largest coding unit (LCU) is 64×64 pixels much larger than the size of a macro block of 16×16 pixels. An example of coding unit division based on the quadtree is shown in FIG. 1 of the following Patent Document 1 and the disclosure related to it. At its depth “zero”, the first coding unit (CU) is a largest coding unit (LCU) comprised of 64×64 pixels. A split flag “0” indicates that a coding unit (CU) at that time is not divided, whereas a split flag “1” indicates that a coding unit (CU) at that time is divided into four small coding units by a quadtree. The following Patent Document 1 has also described that the post-division coding unit (CU) is further quadtree-divided until it reaches a pre-specified smallest coding unit (CU) size.
The following Patent Document 2 has described that a first video processing unit and a second video processing unit are parallel-operated for video encoding/decoding processing based on the H. 264/AVC standard. During the parallel-operation, the first video processing unit sequentially processes first plural macro blocks arranged in one row of one picture, and the second video processing unit sequentially processes second plural macro blocks arranged in the next one row. Especially, the operation timing of the second video processing unit is delayed by two macro blocks than that of the first video processing unit. As a result, the result of processing of the first plural macro blocks arranged in one row by the first video processing unit can be used upon intra-frame predictions of the second plural macro blocks arranged in the next one row by the second video processing unit.
The outline of the H. 265/HEVC standard has been described in the following Non-Patent Document 2. The core of a coding layer based on the previous standard is a macro block including two color difference samples of a 16×16 block and a 8×8 block being brightness samples, whereas in the H. 265/HEVC standard, it is a coding tree unit (CTU) larger than a traditional macro block and whose size is selected by an encoder. The coding unit (CTU) is comprised of a brightness coding tree block (CTB), a color difference coding tree block (CTB), and a syntax element. A quad-tree-syntax of the coding tree unit (CTU) designates the size and position of each of the brightness and color difference coding tree blocks (CTB). A decision as to whether or not an inter-picture or an intra-picture is used for encoding a picture region is done by the level of the coding unit (CU). A split structure of a prediction unit (PU) has a source in the level of the coding unit (CU). Depending on the decision of a basic prediction type, the brightness and color difference coding blocks (CB) can be divided in terms of their size and predicted from the brightness and color difference prediction blocks (PB). The H. 265/HEVC standard supports the size of variable prediction blocks (PB) from 64×64 samples to 4×4 samples. A prediction residual is encoded by block transformation, and the three structure of a transform unit (TU) has a source in the level of the coding unit (CU). The residual of the brightness coding block (CB) can be made identical to the brightness transform block (TB) and divided into smaller brightness transform blocks (TB). This is similar even to the color difference transform block (TB). An integer-based function analogous to the function of the discrete cosine transform (DCT) has been defined for the size of a square transform block (TB) of 4×4, 8×8, 16×16 and 32×32 samples. Uniform Reconstruction Quantization (URQ) is used in the H. 265/HEVC standard as with the H. 264/AVC standard. That is, the range of the value of a quantization parameter (QP) is defined between 0 and 51, and the mapping of quantization parameters (QP) approximately corresponds to the logarithm of a quantization scaling matrix.
Further, the following Non-Patent Document 2 has described that a slice based on the H. 265/HEVC standard is a data structure capable of being encoded independently from other slices of the same picture. Furthermore, the following Non-Patent Document 2 has also described that a novel feature of tiles or wavefront-parallel processing (WPP) has been introduced in the H. 265/HEVC standard to modify the structure of slice data with a view to enhancing or packetizing parallel processing capability. The tiles are intended to divide a picture into square regions. A principal objective of the tiles is to increase the capability of parallel processing rather than providing error-recovery capability. A plurality of tiles are independently decodable regions of one picture. These are encoded by shared header information. One slice is divided into rows of a plurality of coding tree units (CTU) by the wavefront-parallel processing (WPP). The first row is processed by the normal method, and the processing of the second row can be started after a slight decision is made to the first row. After a slight decision is made to the second row, the processing of the third row can be started.
The following Non-Patent Document 3 has described that a block structure based on the H. 265/HEVC standard is based on a coding unit (CU) including a prediction unit (PU) and a transform unit (TU), and each frame is divided into an aggregate of largest coding units (LCU) having 64×64 samples in maximum size. There has also been described in the following Non-Patent Document 3 that each largest coding unit (LCU) is circulatingly separated into small coding units (CU) by a general quad-tree split structure.
Parallel-processing related to the H. 265/HEVC standard has been described in the following Non-Patent Document 3. The parallel processing at a function level is configured by, for example, different parallel stages using a frame level/pipeline approach in a video decoder. A four-stage pipeline can be implemented by parsing (syntax interpretation), entropy decoding, LCU reproduction and filtering. An intra prediction has powerful data dependence to prohibit parallel processing at a block level since data reproduced from an adjacent block is used to generate the current block. A proposal to partially remove this dependence has been known as a “parallel prediction unit for parallel intra coding”. In parallel processing at a data level, several programs are applied to portions different in data set. In a video codec, the parallel processing at the data level is applied to data grains different in frame level, macroblock (or LCU) level, block level and sample level, for example. The parallel processing at the LCU (or macroblock) level can be utilized inside each frame or between frames if data dependence of different kernels is satisfied. Like the intra prediction, the LCU processing of an inclined wavefront can utilize the parallel processing of LCU in a kernel referring to adjacent data at the LCU level. Further, the parallel processing at a slice level has also been described in the following Non-Patent Document 3.
The following Non-Patent Document 4 has described that in relation to the progress of an image coding standard, a parallel processing trial has been made to coding processing and decoding processing from different viewpoints to be described next since dual core and quad core computers can be utilized. It includes a GOP (Group Of Pictures) approach, a frame approach, a pipeline approach, a slice division approach, a macroblock relocation approach or the like. The macroblock relocation approach is intended to propose the processing of macro blocks (MB) by the arrangement of wavefronts. As a result, when the adjacent macro blocks (MB) are available, the macro blocks (MB) of each inclined line are simultaneously encoded. The macroblock relocation approach has widely been used at the present moment by satisfactory grain parallelism at the macroblock (MB) level.
The following Non-Patent Document 4 has described that in order to achieve a more flexible coding system, the H. 265/HEVC standard makes use of a quad-tree-base coding structure which supports macroblocks (MB) having sizes of 64×64 pixels, 32×32 pixels, 16×16 pixels, 8×8 pixels, and 4×4 pixels. The following Non-Patent Document 4 has described that the H. 265/HEVC standard defines the concepts of three blocks of a coding unit (CU), a prediction unit (PU), and a transform unit (TU) separately. Further, the following Non-Patent Document 4 has described that after the size of a largest coding unit (LCU) and the hierarchical depth of a coding unit (CU) have been defined, the overall structure of Codex is characterized by the sizes of the coding unit (CU), prediction unit (PU) and transform unit (TU).
Further, there has been described in the following Non-Patent Document 4, a method called a block-based parallel intra prediction in relation to the H. 265/HEVC standard. The largest coding unit (LCU) of 64×64 pixels is divided into four blocks of a block 0, a block 1, a block 2 and a block 3. The block 0 and the block 1 configure a first set block, and the block 2 and the block 3 configure a second set block. The blocks 0 and 1 of the first set block are predicted in parallel using pixel values adjacent to the upper and left parts of the first set block. The blocks 2 and 3 of the second set block are predicted in parallel using pixel values adjacent to the upper and left parts of the second set block. In contrast, in a prediction system based on the H. 265/HEVC standard, the pixel values adjacent to the upper and left parts of the block 1 are used for prediction of the block 1. The pixel values adjacent to the upper and left parts of the block 3 are used for prediction of the block 3. Therefore, the blocks 0, 1, 2 and 3 are sequentially predicted.