1. Field of the Invention
The present invention relates to an image processing apparatus, an image processing method, and a program, and in particular, to an image processing apparatus, an image processing method, and a program capable of enabling improvement of encoding efficiency in encoding images.
2. Description of the Related Art
In recent years, an encoding apparatus or a decoding apparatus which is based on MPEG (Moving Picture Expert Group) or the like, has come into widespread use for information distribution in a broadcasting station or the like and information reception in an average home. The encoding apparatus or the decoding apparatus treats image information as a digital signal, and, in the context of efficient transmission and accumulation of information at that time, compresses the image information through orthogonal transform, such as discrete cosine transform, and motion compensation by using redundancy peculiar to the image information.
That is, an encoding apparatus comes into widespread use which encodes image information through encoding, for example, MPEG, H.26x, or the like, using orthogonal transform, such as discrete cosine transform or Karhunen-Loeve transform, and motion compensation. Further, a decoding apparatus has come into widespread use which is used to receive encoded data (bit stream) obtained by an encoding apparatus through a network medium, such as satellite broadcasting, cable TV, or Internet, or is used to reproduce encoded data recorded on a recording medium, such as an optical disc, a magnetic disk, or a flash memory.
For example, MPEG2 (ISO/IEC 13818-2) is defined as general-use image encoding, and widely used at present in a wide range of professional and consumer applications as the standard of both interlaced scan images (interlaced images) and sequential scan images (progressive images), and standard-resolution images and high-definition images. With the use of MPEG2 compression, the code quantity (bit rate) of 4 to 8 Mbps is assigned to the standard-resolution interlaced images, for example, having horizontal 720×vertical 480 pixels, and the code quantity of 18 to 22 Mbps is assigned to the high-resolution interlaced scan images having 1920×1088 pixels, a high compression rate and satisfactory image quality may be realized.
MPEG2 is mainly used in high-quality encoding suitable for broadcasting, but is unable to cope with encoding with a code quantity (bit rate) lower than MPEG1, that is, with a compression rate higher than MPEG1. With the spread of mobile phones, in future, there will be a strong demand for such encoding, and accordingly the standardization of MPEG4 encoding has been made. With regard to image encoding, the specification is adopted as the international standard ISO/IEC 14496-2 in December, 1998.
In recent years, in the context of image encoding for videoconferencing, the H.26L (ITU-T Q6/16 VCEG) is being standardized. It is known that, with the use of H.264, the amount of arithmetic operations for encoding and decoding increases compared to the known encoding, such as MPEG2 or MPEG4, but higher encoding efficiency is realized.
As part of the activities of MPEG4, the standardization, in which H.26L is used as a base and a function not being supported by H.26L was incorporated, realizing higher encoding efficiency, is made as a Joint Model of Enhanced-Compression Video Coding, and is adopted as the international standard H.264 and MPEG-4 Part 10 (Advanced Video Coding) (hereinafter, simply referred to as H.264/AVC or AVC) in March, 2003.
According to the AVC, during motion compensation, when a predicted image is created, a plurality of pictures can be referenced as reference pictures.
In an AVC decoding apparatus, pictures after decoding (decoded pictures) including pictures as reference pictures are stored in a buffer which is called a DPB (Decoded Pictures Buffer).
In the DPB, a picture which is referenced over a short term is marked as a short-term reference picture (used for short-term reference), a picture which is referenced over a long term is marked as a long-term reference picture (used for long-term reference), and a picture which is not referenced is marked as a non-reference picture (unused for reference).
The pictures (decoded pictures) stored in the DPB are sorted in a display order and output (read) with a timing designated in advance.
The size of the DPB is defined by profile and level, and is defined with the bit quantity, not the number of pictures.
Thus, the number of pictures which can be stored in the DPB varies depending on the frames or the like of the pictures even at the same profile and level.
For example, at Main profile, level 4, the size MaxDPB of the DPB is defined with the expression MaxDPB=12288.0×1024 bytes.
Thus, at Main profile, level 4, for example, when a picture in a YUV 4:2:0 format with horizontal 1440×vertical 1088 pixels is a picture to be encoded (current picture in encoding process), the DPB can store a maximum of five pictures.
Further, at Main profile, level 4, for example, when a picture in a YUV 4:2:0 format with horizontal 1920×vertical 1088 pixels is a current picture in encoding process, the DPB can store a maximum of four pictures.
In an AVC encoding apparatus, taking into consideration the size of the DPB in the decoding apparatus, encoding has to be performed in the output order in which the pictures are output from the DPB, or such that there is no contradiction in a picture which is referenced in creating a predicted image.
In managing the DPB, there are known two types of a sliding window memory control process and an adaptive memory control process (for example, Shinya Kakuno, Yoshihiro Kikuchi, and Teruhiko Suzuki, “Impress Standard Textbook Series Third Revised Edition H.264/AVC Textbook”, Impress Corporation).
In the sliding window memory control process, the DPB is managed in a FIFO (First In First Out) manner, and the pictures stored in the DPB are released in ascending order of frame_num (become non-reference pictures).
That is, in the sliding window memory control process, an I (Intra) picture, a P (Predictive) picture, and a Bs picture which is a referable B (Bi-directional Predictive) picture are stored in the DPB as short-term reference pictures.
Then, after as many reference pictures are stored as can be stored in the DPB, the earliest (oldest) short-term reference picture from among the short-term reference pictures stored in the DPB is released.
When long-term reference pictures are stored in the DPB, the sliding window memory control process does not affect the long-term reference pictures stored in the DPB. That is, in the sliding window memory control process, only the short-term reference picture from among the reference pictures is managed in the FIFO manner.
In the adaptive memory control process, the pictures stored in the DPB are managed by using a command which is called an MMCO (Memory management control operation).
According to the MMCO command, for the reference pictures stored in the DPB, the short-term reference pictures are set as non-reference pictures, or a long-term frame index for managing a long-term reference picture is assigned to each of the short-term reference pictures. Thus, the short-term reference pictures can be set as long-term reference pictures, the maximum value of the long-term frame index can be set, or all the reference pictures can be set as non-reference pictures.
In the case of AVC, motion compensation of the reference pictures stored in the DPB is performed, such that inter prediction is performed to create a predicted image. Inter prediction of a B picture (including Bs picture) can use a maximum of two reference pictures. Inter predictions using the two reference pictures are respectively called L0 (List 0) prediction and L1 (List 1) prediction.
With regard to a B picture (including a Bs picture), as inter prediction, either L0 prediction or L1 prediction, or both of L0 prediction and L1 prediction are used. With regard to a P picture, as inter prediction, only L0 prediction is used.
In the inter prediction, the reference pictures which are referenced in creating a predicted image are managed by a reference picture list.
In the reference picture list, the reference picture number (Reference Index) for designating a reference picture which is referenced in creating a predicted image is assigned to each of the reference pictures stored in the DPB.
When a current picture in decoding process which is a picture to be decoded (and a current picture in encoding process) is a P picture, as described above, since only L0 prediction is used as inter prediction for a P picture, the assignment of the reference picture number is performed only for L0 prediction.
When the current picture in decoding process is a B picture (including a Bs picture), as described above, since both of L0 prediction and L1 prediction are used as inter prediction for the B picture, the assignment of the reference picture number is performed for both of L0 prediction and L1 prediction.
The reference picture number for L0 prediction is also referred to as an L0 index, and the reference picture number for L1 prediction is also referred to as an L1 index.
When the current picture in decoding process is a P picture, at the AVC default (preset value), for the reference pictures stored in the DPB, a smaller reference picture number (L0 index) is assigned to a later reference picture in the decoding order.
The reference picture number is an integer value equal to or greater than 0, and the minimum value thereof is 0. Thus, when the current picture in decoding process is a P picture, as the L0 index, 0 is assigned to a reference picture which is decoded immediately before the current picture in decoding process.
When the current picture in decoding process is a B picture (including a Bs picture), at the AVC default, for the reference pictures stored in the DPB, the reference picture number (L0 index and L1 index) is assigned in a POC (Picture Order Count) order, that is, in the display order.
That is, in the case of L0 prediction, with regard to the reference pictures temporally earlier than the current picture in decoding process in the display order, the L0 index having a smaller value is assigned to a reference picture closer to the current picture in decoding process. Thereafter, with regard to the reference pictures temporally later than the current picture in decoding process in the display order, the L0 index having a smaller value is assigned to a reference picture closer to the current picture in decoding process.
In the case of L1 prediction, with regard to the reference pictures temporally later than the current picture in decoding process in the display order, the L1 index having a smaller value is assigned to a reference picture closer to the current picture in decoding process. Thereafter, with regard to the reference pictures temporally earlier than the current picture in decoding process in the display order, the L1 index having a smaller value is assigned to a reference picture closer to the current picture in decoding process.
The assignment of the reference picture number (L0 index and L1 index) at the AVC default is performed for each of the short-term reference pictures. The assignment of the reference picture number to the long-term reference picture is performed after the reference picture number has been assigned to the short-term reference picture.
Thus, at the AVC default, the reference picture number having a greater value than the reference picture number assigned to each of the short-term reference pictures is assigned to each of the long-term reference pictures.
In the case of AVC, the assignment of the reference picture number may be performed arbitrarily using a command called Reference Picture List Reordering (hereinafter, also referred to as RPLR command), in addition to the above-described default method.
After the assignment of the reference picture number is performed using the RPLR command, when there is a reference picture assigned with no reference picture number, the reference picture number is assigned to the reference picture by the default method.