As is well known, a typical compression scheme of a moving image (moving picture) in accordance with the standard of MPEG-2 standardized with international standard ISO/IEC 13818-2 is based on a principle of reducing a video storage capacity and a necessary bandwidth by deleting redundant information from a video stream. In addition, MPEG is an abbreviation of Moving Picture Experts Group.
The standard of MPEG-2 defines only bitstream syntax (a rule of a compressed and encoded data row, or a method of constructing a bitstream of encoded data), and a decoding process. Accordingly, the MPEG-2 standard is flexible enough to use in various situations such as satellite broadcasting services, cable television, interactive television, and the Internet.
In an encoding process of MPEG-2, video signals are sampled and quantized so as to initially define color and brightness components of each pixel of a digital video. Values which represent the color and brightness components are stored in a structure that is known as a macro block. The values of the color and the brightness which are accumulated in the macro block are transformed to frequency values by using discrete cosine transform (DCT). Transform coefficients obtained by DCT have frequencies different from each other in accordance with the brightness and the color of a picture. Quantized DCT transform coefficients are encoded in accordance with variable length coding (VLC) that is configured to further compress video streams.
In the encode process of MPEG-2, additional compression in accordance with a motion compression scheme is defined. In the standard of MPEG-2, three kinds of pictures or frames such as an I frame, a P frame, and a B frame exist. The I frame is an intra-encoded frame which is reproduced without reference to other pictures or frames in a video stream. The P frame and the B frame are inter-encoded frames which are reproduced with reference to other pictures or frames. For example, the P frame and the B frame include a motion vector which represents motion estimation relating to a reference frame. When using the motion vector, an MPEG encoder can reduce a bandwidth necessary for a specific video stream. In addition, the I frame is referred to as an independent (intra-coded) frame, the P frame is referred to as a unidirectional prediction (predictive-coded) frame, and the B frame is referred to as a bidirectional prediction (bi-directionally predictive-coded) frame.
Accordingly, a moving image encoding apparatus (encoder) of MPEG-2 includes a frame memory, a motion vector detection unit, a motion compensation unit, a subtraction unit, a DCT transform unit, a quantization unit, an inverse quantization unit, an inverse DCT transform unit, and a variable length encoding unit. A moving image signal that is encoded is stored in the frame memory for encoding of the B frame or detection of the motion vector, and is read out from the frame memory. From the moving image signal, a motion compensation prediction signal transmitted from the motion compensation unit is subtracted in the subtraction unit, and the resultant moving image signal is subjected to DCT transform processing and quantization processing in the DCT transform unit and the quantization unit, respectively. A DCT transform coefficient, which is quantized, is subjected to variable length encoding processing in the variable length encoding unit, and local decoding processing in the inverse quantization unit and the inverse DCT transform unit. Then, the local decoding processing result is supplied to the subtraction unit through the motion compensation unit.
On the other hand, a moving image decoding apparatus (decoder) of MPEG-2 includes a buffer memory, a variable length decoding unit, an inverse quantization unit, an inverse DCT transform unit, a motion compensation unit, an addition unit, and a frame memory. An encoded bitstream of MPEG-2 is stored in the buffer memory, and is subjected to variable length decoding processing, inverse quantization processing, and inverse DCT transform processing in the variable length decoding unit, the inverse quantization unit, and the inverse DCT transform unit, respectively. The resultant processing results are added to a reference image that is generated on the basis of a motion vector that is subjected to variable length decoding processing in the addition unit, and thus a regeneration image signal is generated from an output from the addition unit. The regeneration image signal is stored in the frame memory, and is used for prediction of another frame.
Subsequently to the standard of MPEG-2, there is suggested a typical compression scheme of a moving image in accordance with standard (H. 263) of MPEG-4 standardized with the international standard ISO/IEC 14496 for low-rate encoding in a television telephone and the like. The compression scheme in accordance with the standard of MPEG-4 (H. 263) is referred to as “a hybrid type” that uses inter-frame prediction and discrete cosine transform similar to MPEG-2, and motion compensation in a half-pixel (half-pel) unit is additionally introduced to the compression scheme. The compression scheme uses a Huffman code as entropy encoding similar to MPEG-2, and a three-dimensional variable length encoding (three-dimensional VLC) technology of simultaneously encoding run, level, and last is additionally introduced, thereby greatly improving a compression ratio. In addition, the run and level relate to a coefficient of a run-length, and the last represents a final coefficient. Additionally, the standard of MPEG-4 (H. 263) includes a base portion that is referred to as Baseline, and an expansion standard that is referred to as Annex.
An efficiency improvement by the compression scheme in accordance with the standard of MPEG-4 (H. 263) is not sufficient, and thus the standard of MPEG-4 AVC (H. 264) is standardized in accordance with the international standard ISO/IEC 14496-10 so as to accomplish relatively higher encoding efficiency. In addition, AVC is an abbreviation of Advanced Video Coding, and the standard of MPEG-4 AVC (H. 264) is referred to as H. 264/AVC.
Video coding in accordance with the standard H. 246/AVC is constituted by a video coding layer, and a network abstraction layer. That is, the video coding layer is designed to effectively express video context, and the network abstraction layer formats VCL expression of a video, and applies header information by using an appropriate method for transmission with various transmission layers or storage mediums.
In the international standard moving image encoding methods such as MPEG-2, MPEG-4, and MPEG-4 AVC (H. 264), inter-encoding, that is, inter-frame prediction encoding is used to realize high encoding efficiency by using correlation in a time direction. A frame encoding mode includes an I frame that uses intra-encoding without using correlation between frames, a P frame that performs inter-prediction from one frame that was encoded in the past, and a B frame that can perform inter-prediction from two frames which were encoded in the past.
In the inter-frame prediction encoding, subtraction between a moving image that is a target to be encoded and a reference image (prediction image) that is subjected to motion compensation is executed, and thus a predictive residual resulting from the subtraction is encoded. The encoding processing includes orthogonal transform such as discrete cosine transform (DCT), quantization, and variable length encoding processing. The motion compensation (motion correction) includes a process of spatially moving a reference frame for inter-frame prediction, and the motion compensation processing is executed in a block unit of a frame to be encoded. In a case where motion is not present in image content, a pixel at the same position as the pixel to be predicted is used without movement. In a case where motion is present, the most similar block is searched, and a movement amount is set as a motion vector. A motion compensation block is a block of 16 pixels×16 pixels or 16 pixels×8 pixels in the encoding method of MPEG-2, and is a block of 16 pixels×16 pixels, 16 pixels×8 pixels, or 8 pixels×8 pixels in the encoding method of MPEG-4. The motion compensation block is a block of 16 pixels×16 pixels, 16 pixels×8 pixels, 8 pixels×16 pixels, 8 pixels×8 pixels, 8 pixels×4 pixels, 4 pixels×8 pixels, or 4 pixels×4 pixels in the encoding method of MPEG-4 AVC (H. 264).
The above-described encoding processing is executed for each video screen (frame or field), and a block, in which the screen is subdivided (typically, 16 pixels×16 pixels, and the block is referred to as a macro block (MB) in MPEG), becomes a processing unit. That is, the most similar block (prediction image) is selected from reference images, which are encoded already, for each block to be encoded, and a difference signal between an encoded image (block) and a prediction image is encoded (through orthogonal transform, quantization, or the like). A difference in a relative position between a block to be encoded in a screen, and a prediction signal is referred to as a “motion vector”.
In addition, NPL 1 discloses that the video coding layer (VCL) in accordance with H. 246/AVC follows an approach that is called block-based hybrid video coding. VCL design includes a macro block and a slice. Each picture is divided into a plurality of macro blocks having a fixed size, and each of the macro blocks includes quadrangular picture areas of 16 samples×16 samples in terms of luminance components, and a quadrangular sample area in each of two color difference components which correspond to the quadrangular picture areas. One picture may include one or more slices. Each of the slices is self-inclusive in the meaning of applying an active sequence parameter set and a picture parameter set. Basically, slice expression can be decoded without using information from other slices, and thus a syntax element can be analyzed from a bitstream and a value of a sample of a picture area. However, when applying a deblocking filter over a slice boundary for more perfect decoding, several pieces of information from other slices are necessary.
On the other hand, with regard to a moving image code handling system, in a digital high definition television (HDTV) broadcasting receiver, a digital video camera capable of capturing an image in terms of an HDTV signal, and the like, an image size increases. In an image encoding apparatus or an image decoding apparatus which processes the signals, there is a demand for higher processing performance.
From this kind of circumstance, a new standard H.265 (ISO/IEC 23008-2) that is a standard subsequent to the standard H. 264/MPEG-4 AVC is suggested, and the new standard is referred to as high efficiency video coding (HEVC). It is said that the HEVC standard has excellent compression efficiency due to optimization of a block size and the like, and the HEVC standard has compression performance four times that of the standard of the MPEG-2, and compression performance two times that of the standard H. 264/AVC.
On the other hand, PTL 1 discloses that in various widely employed encoding compression standards such as MPEG-1/2/4, and H. 261/H. 263/H. 264-AVC, one macro block of 16 pixels×16 pixels is used as a processing unit in the motion compensation and the subsequent processing, but in the HEVC standard, a more flexible block structure is employed as a processing unit. The unit of the flexible block structure is referred to as a coding unit (CU), and the coding unit is adaptively divided into a small block using quadtree so as to accomplish satisfactory performance starting from the largest coding unit (LCU). The size of the largest coding unit (LCU) is 64 pixels×64 pixels which is significantly greater than the size (16 pixels×16 pixels) of the macro block. In addition, the largest coding unit (LCU), which is described in PTL 1, corresponds to a coding tree block (CTB) or a coding tree unit (CTU) which is described in the HEVC standard.
An example of coding unit division based on the quadtree is illustrated in FIG. 1 and description relating to FIG. 1 in PTL 1, and at a depth of “zero”, a first coding unit (CU) is the largest coding unit (LCU) of 64 pixels×64 pixels. A split flag “0” represents that the coding unit (CU) at that point of time is not split. In contrast, a split flag “1” represents that the coding unit (CU) at that point of time is divided into four small coding units by the quadtree. PTL 1 also describes that the coding unit (CU) after division is additionally quadtree-divided until reaching the size of a minimum coding unit (CU) which is specified in advance.
NPL 2 describes an outline of the standard of HEVC. A core of a coding layer in a previous standard is a macro block including 16 blocks×16 blocks of luminance samples, and two chromaticity samples of 8 blocks×8 blocks. In contrast, a similar configuration of the standard of HEVC is a coding tree unit (CTU) which is larger than a typical macro block and of which a size is selected by an encoder. The coding tree unit (CTU) includes a luminance coding tree block (CTB), a chromaticity coding tree block (CTB), and a syntax element. The quadtree syntax of the coding tree unit (CTU) designates the size and position of the coding tree block (CTB) of the luminance and the chromaticity. Whether or not to use an inter-picture or an intra-picture for encoding of a picture area is determined in accordance with a level of the coding unit (CU). A division structure of a prediction unit (PU) is based on the level of the coding unit (CU). Size division of the coding block (CB) of the luminance and the chromaticity is possible depending on determination of a basic prediction type, and can be predicted from a prediction block (PB) of luminance and chromaticity. The HEVC standard supports a size of a prediction block (PB) that is variable from 64 samples×64 samples to 4 samples×4 samples. A predictive residual is encoded by block transform, and a tree structure of a transform unit (TU) is based on the level of the coding unit (CU). A residual difference of the coding block (CB) of luminance can be made to be equal to that of the transform block (TB) of luminance, and can be divided to a transform block (TB) of less luminance. This is true of the transform block (TB) of chromaticity. An integer-base function, which is similar to a function of discrete cosine transform (DCT), is defined for the size of a quadrangular transform block (TB) of 4 samples×4 samples, 8 samples×8 samples, 16 samples×16 samples, and 32 samples×32 samples.
In addition, NPL 2 describes a configuration of a hybrid video encoder capable of generating a bitstream conforming to the standard of HEVC, and also describes that a deblocking filter similar to a deblocking filter, which is used in the standard of H.264/MPEG-4 AVC, is included in an inter-picture prediction loop.
PTL 2 describes a configuration in which shape information of an image signal input from an outer side is supplied to a padding unit, which fills an empty area having no image data in a block, so as to effectively encode an image having an arbitrary shape. It is necessary for the horizontal and vertical sizes of the input image signal to be an integral multiple of a block size for compression encoding, and thus the padding unit executes an operation of filling the empty area with an average value of an image area, or an operation of filling the empty area by copying an end pixel of the image area so as to encode an image having an arbitrary shape.
PTL 3 describes an encoding apparatus solving a problem in which when encoding a signal of which a pixel value discontinuously varies at an end portion of a screen in the case of encoding of an image signal, a high-frequency component occurs due to discontinuity of the signal, and thus a lot of code amount occurs. A weight coefficient determination unit calculates a position of an image signal on a screen on the basis of a synchronization signal, and outputs a weight coefficient w that is close to 0 as it approaches the end of the screen. A first multiplier multiplies the input image signal by the weight coefficient w, a second multiplier multiplies an output of a constant-value output unit by a coefficient 1−w, and an adder adds output signals of the two multipliers. Then, an output of the adder is encoded by an encoder. PTL 3 describes that the image signal is smoothly set to a constant value at the end of the screen, and thus a surplus code amount is not necessary.
PTL 4 describes a television telephone apparatus. In the television telephone apparatus, when information indicating an important image position, which is included in transmission images, is set in an operation unit by an operator, an image encoding unit executes encoding in such a manner that image quality of image data in the important image position is further improved in comparison to image data in other positions.
PTL 5 describes padding processing of extrapolating a pixel value in a screen to an outer side of the screen so as to realize a motion vector (unrestricted motion vector (UMV)) on an outer side of the screen which is employed in the standard of MPEG-4 (so as to use an outer side of the screen as a reference image). In addition, PTL 5 discloses a method of preventing noise that occurs in a decoder in a case where whether an extrapolation initiation position in the padding processing is an end of an effective image area or an end of an encoded macro block is not unified by the MPEG-4 standard, and thus the extrapolation initiation position in the padding processing is not unified between an encoder and a decoder.