Field of the Invention
The present invention relates to an image processing apparatus, a method of controlling an image processing apparatus, and a recording medium.
Description of the Related Art
Recent years have seen an increase in the number of devices that can shoot moving images using intra-frame encoding for easy editing after shooting. Unlike the case of inter-frame encoding, intra-frame encoded images do not need so-called reference images. Accordingly, decoding requires only the data for one encoded image, and it is easy to perform an editing task such as clipping a still image from a moving image.
Although this is convenient, there is also the disadvantage that the encoding efficiency is poor compared to the case of using inter-frame encoding. The encoding rate is generally higher in the case of MotionJPEG, which uses only intra-frame encoding, than in the case of MPEG, in which a moving image is encoded using inter-frame encoding. MPEG has therefore often been used for moving image encoding even though editing tasks require more effort.
Meanwhile, due to the prevalence of the H.264 system that enables more efficient image encoding than JPEG and MPEG, attention has once again been given to the intra-frame encoding of moving images. H.264 employs many tools for raising efficiency even simply with respect to intra-frame encoding, such as intra-frame prediction and the ability to adaptively change the prediction block size.
H.264 uses a special intra-frame encoded picture called an IDR (Instantaneous Decoding Refresh) picture (referred to hereinafter as an IDR picture). When an IDR picture is inserted, all of the states necessary for stream decoding are reset (the head of an H.264 stream always needs to be an IDR picture). On the other hand, there are no other restrictions with respect to IDR pictures, and IDR pictures may be inserted at any timing after the head. In other words, all of the frames may be IDR pictures, and it is possible for none of the frames after the head frame to be IDR pictures.
Besides an IDR picture, there is also a normal I picture type of intra-frame encoded frame. This I picture is the same as the I picture employed in MPEG, and can be understood in the sense of being the head of a so-called GOP. In the image data portion of a stream, an IDR picture and an I picture have no difference in terms of encoding efficiency or the like.
Accordingly, when creating a moving image using an image encoding system that employs only intra-frame encoding (referred to hereinafter as an ALL-I picture system), two methods can be used, namely using all IDR pictures, or also including I pictures (or using all I pictures except for the head, for example). However, a different problem arises in the case where, for example, a stream within a moving image that includes an IDR picture is deleted, and the streams before and after the deleted stream are joined.
Each IDR picture has a parameter called a picture identifier (idr_pic_id), and this needs to be written in the header. In H.264, the picture identifier is encoded along with the picture image in an IDR picture using a variable-length encoding system called Golomb coding.
The following describes Golomb coding with reference to the table in FIG. 8, as one example of a technique for variable-length encoding of the picture identifier. The column on the left in the table in FIG. 8 shows Golomb codes (binary numbers), and the column on the right shows the ranges of data values (decimal numbers) that can be expressed. The “1” portion in the Golomb codes in the table in FIG. 8 is called the separator, the “0” portion on the left side of the separator in the center is called the prefix, and the portion on the right side of the separator is called the suffix. In the table in FIG. 8, “x” is used to express the suffix, and “x” can take the value of either 0 or 1. The number of 0s in the prefix is represented by the bit length of the suffix, and the bit length is the same on the left and right of the separator in the center.
For example, if the prefix consists of two 0s, the suffix also has a bit length of 2.
In the table in FIG. 8, the four values 3 to 6 can be expressed in the third row from the top. This is shown specifically below.
Golomb code: 00100=data value: 3
Golomb code: 00101=data value: 4
Golomb code: 00110=data value: 5
Golomb code: 00111=data value: 6
As an exception, only when the data value is 0, there is no prefix or suffix. Making the conversion of numerical values variable in this way improves the encoding efficiency.
According to H.264 recommendations, the picture identifier may have any value from 0 to 65535, but there is the stipulation that the same picture identifier must not be used consecutively in the case where IDR pictures are adjacent to each other. The same picture identifier may be used multiple times in a moving image as long as the same picture identifier is not assigned consecutively to adjacent IDR pictures.
However, when an editing task is performed, there is the possibility of the same picture identifier being used consecutively after editing. If the same picture identifier is used consecutively after editing, it is sufficient to alter either one of the picture identifiers, but since the picture identifier is encoded in a variable-length manner (Golomb coding) as described above, there are cases where it is not possible for just that value to be edited.
For example, if the picture identifier of two consecutive IDR pictures both have the value of 0, the Golomb code will be expressed by the one-bit value of 1. According to the stipulation that the same picture identifier must not be used consecutively, either one of the picture identifiers needs to be changed, but the Golomb-coded bit cannot have any value other than 1.
A problem also arises in the case where the picture identifier for two consecutive IDR pictures is 1, and furthermore the preceding and succeeding picture identifiers are 2. According to the table in FIG. 8, the Golomb code is “010” when the picture identifier is 1. Besides 1, the other value that can be expressed with this bit length is the value of 2, which is represented by the Golomb code “011”. When consecutive IDR pictures have the same picture identifier, either one of the picture identifiers needs to be replaced with a number other than 1, but in this case, the only other value that can be selected is 2. However, if one of the picture identifiers is changed to 2, it will be the same as the picture identifier of the other adjacent IDR picture, thus further requiring another modification.
In this way, it becomes necessary to search for and modify the picture identifiers of a large number of IDR pictures. Depending on the case, a situation can occur in which it is impossible to avoid duplicate values. In such a case, it is necessary to either perform re-encoding or change the editing location to a location other than the desired location.
As a technique for resolving the above-described problem, there is a method in which, if it is not possible for just the picture identifier to be edited, padding bits or the like are deleted, and the bit length allocated to the picture identifier is increased (Japanese Patent No. 4757840).
The following describes an embodiment proposed in Japanese Patent No. 4757840 with reference to FIG. 9. Before the picture identifier is edited, an IDR picture includes the following.
Picture identifier 901 in the header (idr_pic_id)
Padding bits 902 (cabac_alignment_one_bit) inserted to adjust the bit length of the stream data
Padding bits 905 (trailing_zero_8 bits) inserted at the end of the data
After the picture identifier is edited, the IDR picture includes the following.
Picture identifier 903 extended by N bits to make the picture identifier 901 changeable
Padding bits 904 (cabac_alignment_one_bit) that is N bits shorter than the padding bits 902
Padding bits 906 (trailing_zero_8 bits) inserted at the end of the data to adjust the bit length
Here, if it is not possible for just the picture identifier to be edited, the bit length of the picture identifier 901 is increased by N bits to obtain the picture identifier 903 extended by N bits in order to make it possible to change the Golomb code. However, without doing anything else, the overall bit length of the picture will increase by N bits, and it is possible for stream to no longer be a proper H.264 stream. In view of this, in order to adjust the number of bits, cabac_alignment_one_bit 902 is reduced by N bits, which is the amount of increase, thus obtaining the padding bits 904 and preventing an increase in the overall number of bits. However, cabac_alignment_one_bit 902 is image-dependent data, and it is possible to not have enough leeway to reduce it by N bits. In this case, the trailing_zero_8 bits 905 added to the end of the stream data is reduced by the amount of increase, thus obtaining the padding bits 906. Since the trailing_zero_8 bits is also image-dependent, there are cases where this data does not exist. In such a case, balance is achieved using the next slice data. In this kind of method, editing is relatively easy in the case where there is a parameter that enables adjusting the bit length in order to edit the picture identifier at the head of the data, but if such a parameter does not exist, there is the risk of processing becoming complex due to the need to analyze the stream and adjust padding bits.