<Introduction and Definitions of Basic Terms>
In a block-based video encoding system: an input video to be encoded is divided into predetermined unit of processing that is referred to as “macro blocks” (hereinafter, “MB”); encoding processing is executed for each of the MBs; and, thereby, encoded data is produced. When a video is reproduced, encoded data to be decoded is processed for each of the MBs, and a decoded image is produced.
There is a system specified in Non-Patent Literature 1 (H.264/AVC (Advanced Video Encoding)) as a block-based video encoding system that is widely prevalent at present. According to H.264/AVC, predictive images that predict an input video to be divided into MBs are produced, and a prediction residual that is a difference between the input video and the predictive image is calculated. A transform coefficient is derived by applying a frequency transform as represented by a discrete cosine transform (DCT) to the prediction residual. The derived transform coefficient is variable-length-encoded using a method that is referred to as “CABAC (Context-based Adaptive Binary Arithmetic Encoding)” or “CAVLC (Context-based Adaptive Variable Length Encoding)”. The predictive image is produced by intra prediction that uses the spatial correlation of the video or inter prediction (motion compensating prediction) that uses the special correlation of the videos.
<Concept of Partition and Effects Thereof>
According to the inter prediction, an image that approximates an input video of an MB to be encoded is produced by a unit that is referred to as “partition”. One or two motion vector(s) are related to each partition. A predictive image is produced by referring to an area that corresponds to the MB to be encoded on a local decoded image that is recorded in a frame memory, based on the motion vector(s). The local decoded image referred to in this case is called as “reference image”. According to H.264/AVC, such partition sizes are available as “16×16”, “16×8”, “8×16”, “8×8”, “8×4”, “4×8”, and “4×4” in pixels. When a small partition size is used, a predictive image can be produced by designating each motion vector in fine units and, therefore, the predictive image can be produced that is close to the input video even when the spatial correlation of the motion is low. On the other hand, when a large partition size is used, the amount of codes can be reduced that are necessary for encoding a motion vector when the spatial correlation of the motion is high.
<Concept of Transform Size and Effects Thereof>
For a prediction residual that is produced using a predictive image, spatial or temporal redundancy of the pixel value of the input video is reduced. In addition, an energy can be concentrated on a low frequency component of a transform coefficient by applying a DCT to the prediction residual. Therefore, by executing the variable-length-encoding using the bias of the energy, the amount of codes of the encoded data can be reduced compared to that of the case where no predictive image and no DCT are used.
According to H.264/AVC, a system (block-adaptive transform selection) is employed that selects a DCT adapted to the local property of the video from DCTs having plural kinds of transform sizes for the purpose of increasing the energy concentration on the low frequency component by the DCT. For example, when a predictive image is produced using the inter prediction, the DCT can be selected that is applicable to the transform of the prediction residual, from two kinds of DCTs that are an 8×8 DCT and a 4×4 DCT. The 8×8 DCT is effective for a flat area having relatively a small amount of high-frequency components because the spatial correlation of the pixel value can be used in a wide range in the 8×8 DCT. On the other hand, the 4×4 DCT is effective for an area having a large amount of high-frequency components such as an area that includes a contour of an object. It can be said that, according to H.264/AVC, the 8×8 DCT is the DCT for a large transform size and the 4×4 DCT is the DCT for a small transform size.
According to H.264/AVC, the 8×8 DCT and the 4×4 DCT can be selected when the area of a partition is equal to or larger than 8×8 pixels. The 4×4 DCT can be selected when the area of a partition is smaller than 8×8 pixels.
As above, according to H.264/AVC, a suitable partition size and a suitable transform size can be selected corresponding to the degree of each of the spatial correlation the pixel value or the spatial correlation of the motion vector that are the local properties of a video. Therefore, the amount of codes of the encoded data can be reduced.
<Description of Adaptive Transform Size Expansion and Partition Size Expansion>
Recently, high-definition videos have increased that have the resolution equal to or higher than the “HD (1920 pixels×1080 pixels)”. Compared to the case of a conventional low-resolution video, in the case of a high-definition video, the spatial correlation of the pixel value and the spatial correlation of the motion vector on a video can take a wide range in a local area in the video. Above all, the high-definition video has a property that the spatial correlations are high in a local area for both of the pixel value and the motion vector.
Non-Patent Literature 2 describes a video encoding system according to which the amount of codes of encoded data is reduced by using the property of the spatial correlation in a high-definition video as above by expanding the partition size and the transform size in H.264/AVC.
More specifically, partition sizes such as “64×64”, “64×32”, “32×64”, “32×32”, “32×16”, and “16×32” are added in addition to those that are specified in H.264/AVC. Furthermore, DCT that has three kinds of new transform sizes of “16×16 DCT”, “16×8 DCT”, and “8×16 DCT” are added in addition to those that are specified in H.264/AVC.
When the area of a partition is equal to or larger than 16×16 pixels, the 16×16 DCT, the 8×8 DCT, and the 4×4 DCT can be selected. When the partition size is 16×8, the 16×8 DCT, the 8×8 DCT, and the 4×4 DCT can be selected. When the partition size is 8×16, the 8×16 DCT, the 8×8 DCT, and the 4×4 DCT can be selected. When the partition size is 8×8, the 8×8 DCT, and the 4×4 DCT can be selected. When the area of the partition is smaller than 8×8 pixels, the 4×4 DCT can be selected.
According to the system described in the Non-Patent Literature 2, the amount of codes of the encoded data can be reduced, because the partition size and the transform size that are adaptive to the local property of the video can be selected even for a high-definition video which has relatively wide dynamic ranges of spatial correlations of the pixel and the motion vector by switching among the above various partition sizes and transform sizes.