The present invention relates to a video encoder and an operation method thereof, and in particular, relates to a technique effective for reducing noise or the like generated on a boundary of tiles introduced in a video coding method in order to enhancing parallel processing capability.
As known well, the general compression method of a video by the MPEG-2 standard which is standardized in the international standard ISO/IEC 13818-2 is based on the principle that video storage capacity and necessary band width are reduced by removing redundant information from a bit stream. Here, MPEG stands for Motion Picture Experts Group.
Since the MPEG-2 standard defines only a bit stream syntax (rule for a compressed coded data sequence or configuration method for a bit stream of the coded data) and a decoding process, the MPEG-2 standard is flexible so as to be utilized sufficiently well in various situations such as satellite broadcasting service, cable television, interactive television, and the internet.
In the coding process of MPEG-2, first, a video signal is sampled and quantized for defining a color component and a brightness component in each pixel of a digital video. The values indicating the color and brightness components are stored into a structure known as a macro block. The color and brightness values stored in the macro block are transformed into frequency values through the use of discrete cosine transform (DCT). A transform coefficient obtained by DCT has a different frequency between the brightness and the color of a picture. The quantized DCT transform coefficients are coded by variable length coding (VLC) which further compresses the video stream.
The MPEG-2 coding process defines an additional compression by a motion compression technique. In the MPEG-2 standard, three kinds of pictures or frames exist as I-frame, P-frame, and B-frame. I-frame is a frame subjected to intra-coding which means that the frame is reproduced without reference to any other pictures or frames in the video stream. P-frame and B-frame are frames subjected to inter-coding which means that the frame is reproduced with reference to the other pictures or frames. For example, each of P-frame and B frame includes a motion vector indicating motion estimation from a reference frame. Through the use of the motion vector, it becomes possible to reduce bandwidth necessary for a specific video stream in an MPEG encoder. Meanwhile, I-frame is referred to as an intra-coded frame, P-frame is referred to as a predictive-coded frame, and B-frame is referred to as a bi-directionally predictive-coded frame.
Accordingly, a video encoder of MPEG-2 is constituted of a frame memory, a motion vector detection unit, a motion compensation unit, a subtraction unit, a DCT transform unit, a quantization unit, an inverse quantization unit, an inverse DCT transform unit, and a variable length coding unit. A video signal to be coded is read out from the frame memory after having been stored in the frame memory for coding and motion vector detection of B-frame, a compensation prediction signal from the motion compensation unit is subtracted in the subtraction unit, and DCT transform processing and quantization processing are executed in the DCT transform unit and the quantization unit, respectively. The quantized DCT transform coefficients are subjected to variable length coding processing in the variable length coding unit, and also subjected to local decoding processing in the inverse quantization unit and the inverse DCT transform unit, and then the result of this local decoding processing is supplied to the subtraction unit via the motion compensation unit.
On the other hand, a video decoder is constituted of a buffer memory, a variable length decoding unit, an inverse quantization unit, an inverse DCT transform unit, a motion compensation unit, an addition unit, and a frame memory. The MPEG-2 coded bit stream, after having been stored in the buffer memory, is subjected to variable length decoding processing, the inverse quantization processing, and the inverse DCT transform processing in the variable length decoding unit, the inverse quantization unit, and the inverse DCT transform unit, respectively, and then the motion vector which has been subjected to the variable length decoding processing is added in the addition unit and a reproduced image signal is generated from the output of the addition unit. This reproduced image signal is stored into the frame memory and is used for prediction of the other frames.
Following the MPEG-2 standard, there has also been proposed a general video compression method by the MPEG-4 standard (H.263) standardized in the international standard ISO/IEC 14496 for low-rate coding in a TV telephone and the like. The compression method by the MPEG-4 (H.263) standard is a compression method referred to as a “hybrid type” using the inter frame prediction and the discrete cosine transform in the same way as in MPEG-2, and further introduces motion compensation in a unit of a half pixel (half-pel). This compression method, while using a Huffman code for entropy coding in the same way in MPEG-2, newly introduces a technique of a three-dimensional variable length coding (three-dimensional VLC) which codes run, level, and last at the same time, and enhances a compression rate considerably. Here, run and level relate to run length coefficients and last indicates the last coefficient. Moreover, the MPEG-4 (H.263) standard includes a basic part referred to as Baseline and an extended standard referred to as Annex.
Because of an insufficient efficiency improvement in the compression method in accordance with the MPEG-4 (H.263) standard, the MPEG-4 AVC (H.264) standard was standardized by the international standard ISO/IEC 14496-10 for achieving a higher coding efficiency without consideration of compatibility with the existing methods. Meanwhile, AVC stands for Advanced Video Coding, and the MPEG-4 AVC (H.264) standard is referred to as H.264/AVC.
Video coding by the standard H.264/AVC is constituted of a video coding layer and a network abstraction layer. That is, the video coding layer is designed so as to cause a video context to be expressed effectively, and the network abstraction layer formats video VCL expression and also provides, by an appropriate method, header information for transport by various transport layers and recording media.
In the international standard video coding method such as MPEG-2, MPEG-4, and MPEG-4 AVC (H.264), the inter coding, that is, inter-frame prediction coding is used for realizing a high coding efficiency by using correlation in the time direction. Frame coding modes include I-frame using intra-coding without using correlation between the frames, P-frame which is inter-predicted from I-frames coded in the past, and B-frame which can be inter-predicted from two frames coded in the past.
In this inter-frame prediction coding, a reference image (prediction image) subjected to motion compensation is subtracted from a video, and a residual error in this subtraction is coded. Coding processing includes processing of orthogonal transform such as the DCT (discrete cosine transform), the quantization, and the variable length coding. Motion compensation (motion correction) includes processing of spatially moving a reference frame of the inter-frame prediction, and the motion compensation processing is performed in a block unit of the frame to be coded. When image contents do not include motion, the movement is not necessary and a pixel at the same position as a pixel to be predicted is used. When motion exists, a block having the largest similarity is searched for and a movement amount is defined as a motion vector. The block for the motion compensation is a block of 16 pixels×16 pixels/16 pixels×8 pixels in the MPEG-2 coding method, and a block of 16 pixels×16 pixels/16 pixels×8 pixels/8 pixels×8 pixels in the MPEG-4 coding method. In the MPEG-4 AVC (H.264) coding method, the motion compensation block is a block of 16 pixels×16 pixels/16 pixels×8 pixels/8 pixels×16 pixels/8 pixels×8 pixels/8 pixels×4 pixels/4 pixels×8 pixels/4 pixels×4 pixels.
The above-described coding processing is performed for each picture screen (frame or field), and a block (normally, 16 pixels×16 pixels, referred to as a macro-block (MB) in MPEG) obtained by segmentalizing the screen is a processing unit. That is, for each of the blocks to be coded, the most similar block (prediction image) is selected from the already coded reference image, and a differential signal of the coding image (block) and the prediction image is subjected to the coding (orthogonal transform, quantization, or the like). A relative position difference between the block to be coded and a prediction signal in the screen is referred to as the motion vector.
Furthermore, non-patent literature 1 (Gary J. Sullivan et al., “Video Compression—From Concept to the H.264/AVC Standard”, Proceeding of the IEEE, vol. 93, no. 1, January 2005, pp. 18-31) describes that the video coding layer (VCL) by H.264/AVC follows an approach referred to as block-based hybrid video coding. VCL is constituted of a macro-block, a slice, and a slice-block, and each picture is divided into a plurality of macro-blocks having a fixed size, each of the macro-block includes a rectangular picture area of 16×16 samples for a brightness component and a rectangular sample area for each of the corresponding two color difference components. A picture may contain one or more slices. Each slice is self-contained, in the sense that, given the active sequence and picture parameter sets, its syntax elements can be parsed from the bitstream and the values of the samples in the area of the picture that the slice represents can basically be decoded without use of data from other slices of the picture. However, for completely exact decoding, some information from other slices may be needed in order to apply the deblocking filter across slice boundaries. In addition, it is also described in non-patent literature 1 that, since each of the slices is coded and decoded independently of the other slices of the picture, the slice can be used for parallel processing.
Meanwhile, the image size in a system treating a video code is being increased in HDTV (High Definition Television) broadcasting equipment, a digital video camera capable of capturing a HDTV signal, and the like. A still higher processing capability is required for an image encoder and an image decoder, processing such a signal.
From such a background, a new standard H.265 (ISO/IEC 23008-2) which is a standard succeeding the standard H.264/MPEG-4 AVC has been proposed, and this new standard is called HEVC (High Efficiency Video Coding). This new standard is excellent in compression efficiency by appropriation and the like of the block size, and is considered to have a compression capability approximately 4 times higher than the MPEG-2 standard and approximately 2 times higher than the standard H.264/AVC.
Meanwhile, patent literature 1 (US Patent Application Publication No. US2012/0106652A1 Specification) describes that, while one macro-block configured with 16×16 pixels is used as a processing unit for the motion compensation and the subsequent processing in various widely-employed coding compression standards such as MPEG-1/2/4 and H.261/H.263/H.264-AVC, a more flexible block structure is adopted as a processing unit in the next generation standard called HEVC. The unit of this flexible block structure is referred to as a coding unit (CU), the coding unit starts with the largest coding unit (LCU) and divided adaptively into smaller blocks through the use of a quadtree for achieving a better performance. The size of the largest coding unit (LCU) is 64×64 pixels which is far larger than the micro-block size of 16×16 pixels. FIG. 1 and the disclosure relating FIG. 1 of patent literature 1 show an example of the coding unit division based on the quadtree, and, at a depth “zero” thereof, the initial coding unit (CU) is the largest coding unit (LCU) constituted of 64×64 pixels. While split flag “0” shows that the underlying coding unit (CU) is not divided, split flag “1” shows that the underlying coding unit (CU) is divided into four smaller coding units by the quadtree. It is also described in patent literature 1 that the coding unit (CU) after the division is further divided by the quadtree until the size of a preliminarily specified smallest coding unit (CU) is reached.
The outline of the standard HEVC is described in non-patent literature 2 (Gary J. Sullivan et al., “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no, 12, December 2012, pp. 1649-1668). While the core of the coding layer in the previous standards is the macro-block including a 16×16 block of the brightness sample and two 8×8 blocks of the chromaticity samples, an analogous structure in the HEVC standard is a coding tree unit (CTU) which has a size selected by the encoder and can be larger than the traditional macro-block. The coding tree unit (CTU) is constituted of a brightness coding tree block (CTB) and a chromaticity coding block (CTB), and syntax elements. A quad-tree syntax of the coding tree unit (CTU) specifies the sizes and positions of the coding tree blocks (CTB) of the brightness and the chromaticity. The determination whether an inter-picture or an intra-picture is used for coding a picture area is made at the level of the coding unit (CU). A dividing structure of a prediction unit (PU) has the root thereof at the level of the coding unit (CU). Depending on the determination of a basic prediction type, the brightness and chromaticity coding blocks (CB) can be divided in size, and can be predicted from the brightness and chromaticity prediction blocks (PB). The HEVC standard supports variable prediction block (PB) sizes from 64×64 samples down to 4×4 samples. A prediction error is coded by block transform, and a tree structure of a transform unit (TU) has the root thereof at the level of the coding unit (CU). A residual error of the brightness coding block (CB) is identical to the brightness transform block (TB), and can be further divided into smaller brightness transform blocks (TB). The same applies to the chromaticity transform block (TB). Integer-basis functions similar to those of a discrete cosine transform (DCT) function are defined for rectangular transform blocks (TB) of 4×4, 8×8, 16×16, and 32×32 samples.
In addition, non-patent literature 2 describes that a slice in the HEVC standard is a data structure which can be coded independently of the other slices in the same picture. Furthermore, non-patent literature 2 describes that new features of a tile and wave front parallel processing (WPP) are introduced in the HEVC standard for modifying the slice data structure in order to enhance the parallel processing capability or to perform packetizing. The tile divides a picture into rectangular areas, and a main purpose of the tile is to enhance the parallel processing capability rather than to provide resilience of the error. Plural tiles are areas where one picture can be decoded independently, and coded by common header information. By the wave front parallel processing (WPP), one slice is divided into a plurality of rows of the coding tree units (CTU). The first row is processed by a normal way, the second row can start to be processed after some determination of the first row has been made, and the third row can start to be processed after some determination of the second row has been made.
Furthermore, non-patent literature 2 describes a configuration of a hybrid video encoder capable of generating a bit stream complying with the HEVC standard, and also describes that a de-blocking filter similar to the one used in the H.264/MPEG-4 AVC standard is included in an inter-picture prediction loop thereof.