The present invention relates to a moving-picture decoding processing apparatus, a moving-picture coding processing apparatus, and an operating method of the same and, more particularly, to a technique effective to lessen deterioration in processing capability in parallel processing.
As it is well known, a general compressing method of a moving picture according to the MPEG-2 standard standardized by the international standard ISO/IEC 13818-2 is based on the principle of reducing video storage capacity and necessary bandwidth by eliminating redundant information from a video stream. MPEG stands for Moving Picture Experts Group.
Since the MPEG-2 standard specifies only the syntax of a bit stream (the rules of a compression-coded data string or a method of configuring a bit stream) and decoding process, it is flexible so that it can be sufficiently used in various situations such as satellite broadcasting and service, cable television, interactive television, internet, and the like.
In an encoding process of MPEG-2, first, to specify the components of the color difference and luminance of each of pixels of a digital video, a video signal is sampled and quantized. The values of the components of the color difference and the luminance are accumulated in a macro block. The values of the color difference and luminance accumulated in the macro block are transformed to frequency values by using discrete cosine transform (DCT). A transform coefficient obtained by the DCT has a frequency which is different between luminance and the color difference of a picture. The transform factor of the DCT quantized is encoded by variable length coding (VLC) which further compresses a video stream.
In the encoding process of MPEG-2, addition compression according to the motion compressing technique is specified. In a standard of MPEG-2, three kinds of frames; I frame, P frame, and B frame (also called pictures) exist. The I frame refers to a frame which is intra-coded and means a frame to be reproduced without referring to any other frames in a video stream. The P frame and the B frame refer to frames which are inter-coded and mean frames to be reproduced with reference to the other frames. For example, each of the P frame and the B frame includes a motion vector indicative of motion estimation on a reference frame. By using the motion vector, an MPEG encoder can reduce a bandwidth necessary for a specific video stream. The I frame is called an intra-coded frame, the P frame is called a predictive-coded frame, and the B frame is called a bi-directionally predictive-coded frame.
Therefore, a moving-picture coding apparatus (encoder) of MPEG-2 includes a frame memory, a motion vector detecting unit, a motion compensating unit, a subtracting unit, a DCT unit, a quantizing unit, an inverse quantizing unit, an inverse DCT unit, a variable-length coding unit, and an adder. A moving picture signal coded is stored in the frame memory for coding of the P frame and the B frame and detection of a motion vector and read from the frame memory, and a motion compensation prediction signal from the motion compensating unit is subtracted by the subtracting unit. A prediction residual generated by the subtraction is subjected to a DCT process and a quantizing process in the DCT unit and the quantizing unit, respectively. The quantized DCT coefficient is subjected to a variable-length coding process by the variable-length coding unit, and subjected to a local decoding process in the inverse quantizing unit and the inverse DCT unit, and the result of the local decoding process is supplied directly to the adding unit and supplied to the subtracting unit via the motion compensating unit.
On the other hand, a moving-picture decoding apparatus (decoder) of MPEG-2 includes a buffer memory, a variable-length decoding unit, an inverse quantizing unit, an inverse DCT unit, a motion compensating unit, an adding unit, and a frame memory. A coded bit stream of MPEG-2 is stored in the buffer memory and, after that, subjected to a variable-length decoding process, an inverse quantizing process, and an inverse DCT process in the variable-length decoding unit, the inverse quantizing unit, and the inverse DCT unit, respectively. A prediction image obtained by the motion compensating unit from the motion vector subjected to the variable-length decoding process and the result of the inverse DCT process are added by the adding unit, and a reproduction image signal is generated from the output of the adding unit. The reproduction image signal is stored in the frame memory and used for prediction of other frames.
Subsequent to the MPEG-2 standard, a moving picture compressing method according to the MPEG-4 standard (H.263) standardized by the international standard ISO/IEC 14496 for coding at low rate for a television telephone or the like is also proposed. A compression method according to the MPEG-4 standard (H.263) is called a “hybrid type” using inter-frame prediction and discrete cosine transform like the MPEG-2 and, further, in which motion compensation in the ¼ pixel (quarter pel) unit is introduced. The compression method uses, like the MPEG-2, a Huffman code as entropy coding. By newly introducing a technique called three-dimensional variable length coding (three-dimensional VLC) which codes “run”, “level”, and “last” at the same time, the compression ratio is largely improved. The “run” and “level” relate to a run-length coefficient, and “last” indicates whether the coefficient is the last one or not. The MPEG-4 standard (H.263) further includes a basic part called baseline and an extended standard called annex.
To make the coding efficiency of the compression method according to the MPEG-4 standard (H.263) higher, the MPEG-4 AVC standard (H.264) is standardized by the international standard ISO/IEC 14496-10. AVC stands for advanced video coding, and the MPEG-4 AVC standard (H.264) is called the H.264/AVC standard.
Video coding according to the H.264/AVC standard includes a video coding layer and a network abstraction layer. Specifically, the video coding layer is designed to effectively express a vide context, and the network abstraction layer is to format the VCL expression of a video and give header information in a proper method for transfer by various transfer layers and storing media.
In international standard moving-picture coding methods such as MPEG-2, MPEG-4, H.264/AVC standard, and the like, to realize high coding efficiency, inter-frame predictive coding is used. A frame coding mode includes an I frame which is coded without using correlation of frames, a P frame predicted from one frame coded in the past, and a B frame which can be predicted from two frames coded in the past.
In the inter-frame predictive coding, a reference picture (predictive picture) which is motion-compensated is subtracted from a moving picture, and a prediction residual by the subtraction is coded. The coding process includes processes of orthogonal transform such as DCT (Discrete Cosine Transform), quantization, and variable-length coding. The motion compensation (motion correction) includes a process of spatially moving a reference frame of inter-frame prediction. The motion compensation process is performed on a block unit basis of frames to be coded. In the case where there is no motion in an image, there is no transfer and the pixel in the same position as that of a pixel to be predicted is used. In the case where there is a motion, a block which is most adapted is retrieved and a movement amount is used as a motion vector. A motion compensation block is a block of 16 pixels×16 pixels/16 pixels×8 pixels in the MPEG-2 coding method and is a block of 16 pixels×16 pixels/16 pixels×8 pixels/8 pixels×8 pixels in the MPEG-4 coding method. The motion compensation blocks are blocks of 16 pixels×16 pixels/16 pixels×8 pixels/8 pixels×16 pixels/8 pixels×8 pixels/8 pixels×4 pieces/4 pieces×8 pieces/4 pixels×4 pixels in the coding method of the H.264/AVC standard.
The above-described coding process is performed every video image screen (frame or field), and a block (usually, 16 pixels×16 pixels, called the macro block (MB) in the MPEG) obtained by segmentalizing a screen is a process unit. That is, a most similar block (prediction picture) is selected from reference pictures already coded every block to be coded, and a difference signal between a picture (block) to be coded and a prediction picture is coded (orthogonal transform, quantization, and the like). The difference in relative positions between a block to be coded and a prediction signal in the screen is called a motion vector.
In the following non-patent literature 1, it is described that a video coding layer (VCL) according to the H.264/AVC standard follows an approach called block-based hybrid video coding. VCL design includes a macro block, a slice, and a slice block. Each picture is divided into a plurality of macro blocks of fixed size. Each macro block includes a rectangular picture region of 16×16 samples in luminance components, and rectangular sample regions in two color difference components corresponding to the luminance component. One picture can include one or more slices, and each slice is self-inclusive in a sense that it gives an active sequence and a picture parameter set. Since the slice representation can be basically decoded without using information from other slices, a syntax element can be analyzed from a bit stream and the value of a sample in a picture area. For more complete decoding, however, to make a deblocking filter adapted to the slice border, some information from other slices is necessary. The non-patent literature 1 also describes that since each slice is encoded/decoded independently of other slices of a picture, the slices can be used for parallel processing.
On the other hand, the picture size of a system handling moving picture codes such as a digital HDTV (High Definition Television broadcast receiver or a digital video camera capable of capturing an HDTV signal is becoming larger. A picture coding apparatus and a picture decoding apparatus processing those signals are requested to have higher processing capability.
From such a background, the H.265 (ISO/IEC 23008-2) standard as a standard following the H.264/AVC standard was proposed. The new standard is also called the HEVC (High Efficiency Video Coding) standard. The HEVC standard has excellent compression efficiency realized by making the block size proper and has compression efficiency which is about four times as high as that of the MPEG-2 standard and is about twice as high as that of the H.264/AVC standard.
On the other hand, the patent literature 1 describes that one macro block made of 16×16 pixels is used as a process unit of motion compensation and subsequent processes in widely-adopted various coding compression standards such as MPEG-1, MPEG-2, MPEG-4, H.261, H.263, H.264, AVC standard, and the like whereas, in the H.265/HEVC standard, a more flexible block structure is employed as a process unit. The unit of the flexible block structure is called a coding unit (CU). Starting from the largest coding unit (LCU), to achieve excellent performance, a picture is adaptively divided into small blocks using quadtree. The size of the largest coding unit (LCU) is 64×64 pixels much larger than the size of the macro block of 16×16 pixels. In FIG. 1 of the patent literature 1 and disclosure related to it, an example of coding unit division based on the quadtree is shown. In the depth “zero”, the initial coding unit (CU) is a largest coding unit (LCU) made of 64×64 pixels. The split flag “0” indicates that the coding unit (CU) at that time point is not split, and the split flag “1” indicates that the coding unit (CU) at that time point is split to four small coding units by the quadtree. The patent literature 1 also describes that the coding unit (CU) after splitting is further split by the quadtree until it reaches a preliminarily specified minimum coding unit (CU) size.
The non-patent literature 2 describes the overview of the H.265/HEVC standard. The core of a coding layer in the previous standards is a macro block including a 16×16 block of luminance samples and two 8×8 blocks of chroma samples, whereas the core in the H.265/HEVC standard is a coding tree unit (CTU) which is larger than a traditional macro block and has a size selected by an encoder. The coding tree unit (CTU) includes a luminance coding three block (CTB), chroma coding three blocks, and syntax elements. The quadtree syntax of the coding tree unit (CTU) specifies the size and positions of its luminance and chroma coding tree blocks (CTB). The decision whether to use an inter-picture or intra-picture is made at the level of the coding unit (CU). The splitting structure of a prediction unit (PU) has its root at the level of the coding unit (CU). Depending on the basic prediction-type decision, the coding block (CB) of luminance and chroma can be split in size and predicted from prediction blocks (PB) of luminance and chroma. The H.265/HEVC standard supports variable sizes of the prediction blocks (PB) from 64×64 samples to 4×4 samples. The prediction residual is coded using block transforms. The tree structure of a transform unit (TU) has its root at the level of the coding unit (CU). The residual of the coding block (CB) of luminance can be identical to the transform block (TB) of luminance or can be further split into smaller luminance transform blocks (TB). The same applies to the transform blocks (TB) of chroma. Integer basis functions similar to those of a discrete cosine transform (DCT) are defined for the sizes of square transform blocks (TB) of 4×4, 8×8, 16×16, 32×32 samples. In the H.265/HEVC standard, like in the H.264/AVC standard, uniform reconstruction quantization (URQ) is used. That is, the range of the values of the quantization parameter (QP) is defined from 0 to 51, and the mapping of the quantization parameter (QP) approximately corresponds to logarithms of a quantization scaling matrix.
Further, the non-patent literature 2 also describes that a slice of the H.265/HEVC standard is a data structure that can be coded independently from other slices of the same picture. The non-patent literature 2 also describes that novel features of tiles and wavefront parallel processing (WPP) are introduced in the H.265/HEVC standard in order to modify the structure of slice data for enhancing the processing capability in the parallel process or for packetization purposes. Tiles are used to partition a picture into rectangular regions and main purpose of the tiles is to increase the capability for parallel processing rather than provide error resilience. A plurality of tiles is regions which can be decoded independently of a single picture and coded with shared header information. A slice is divided into rows of a plurality of coding tree units (CTU) by the wavefront parallel processing (WPP). The first row is processed in an ordinary way, the second row can begin to be processed after some decision is made in the first row, and the third row can begin to be processed after some decision is made in the second row.
Further, FIGS. 7, 8, and 9 of the patent literature 2 and the disclosure related to the literature illustrate an MPEG decoder performing parallel processing at the slice level on a bit stream coded by the MPEG-2 standard. Specifically, in the MPEG-2 standard, a slice includes only macro blocks (MB) of one row. By performing the parallel processing at the slice level, the MPEG decoder executes the parallel processing of the macro blocks (MB) of a plurality of rows.
The patent literature 2 also describes a problem that, since a unique code called a slice header as in the MPEG-2 standard does not exist in a picture called VOP (Video Object Plane) in the MPEG-4 (H.263) standard, the parallel processing at the slice level cannot be performed. To solve the problem, the image decoding apparatus according to the first embodiment of FIGS. 1 and 2 of the patent literature 2 has a bit stream analyzer, four VOP decoders, a frame memory, and a memory control unit. The bit stream analyzer executes decoding process start control on the four VOP decoders so that the decoding process start timing of each of macro blocks in the four VOP decoders becomes after completion of the decoding of a reference picture region needed by each of the macro blocks. In the first embodiment of FIGS. 1 and 2 of the patent literature 2, to concretely execute the decoding process start, an FCODE is used. In the case where FCODE=2, a reference picture region becomes ±32 pixels, so that the reference picture region needed by a process macro block in the picture to be coded lies in a range of two upper and lower macro blocks (MB) and two right and left macro blocks (B) with respect to the position of the process macro block. With respect to the FCODE, as illustrated in FIG. 13 of the patent literature 3, in the case where FCODE=1, a motion vector search range becomes −16 to +15.5 pixels. In the case where FCODE=2, the motion vector search range becomes −32 to +31.5 pixels. In the case where FCODE=3, the motion vector search range becomes −64 to +63.5 pixels. In the case where FCODE=4, the motion vector search range becomes −128 to +127.5 pixels. In the case where FCODE=5, the motion vector search range becomes −256 to +255.5 pixels. In the case where FCODE=6, the motion vector search range becomes −512 to +511.5 pixels. In the case where FCODE=7, the motion vector search range becomes −1024 to +1023.5 pixels.
Further, in the second embodiment of FIGS. 3 and 4 of the patent literature 2, it is described that in the case where a motion vector indicating the reference picture region needed by the process macro block in the picture to be coded indicates an unprocessed region, a decode control unit controls the apparatus to wait until decoding of the unprocessed region is completed.