In recent years, an encoding technology which encodes image data at a high compression rate with high image quality to handle motion image information as digital data and to use it in storage and transmission is demanded. In order to compress image information, a method such as MPEG or the like, which compression-encodes image information by orthogonal transformation, such as discrete cosine transformation or the like, and motion prediction/motion compensation by utilizing redundancy unique to the image information, has been proposed and prevalent.
Also, in recent years, H.264 (also called MPEG4 Part 10:AVC) as an encoding method that aims at higher compression rate and higher image quality is available. Compared to conventional encoding methods such as MPEG2, MPEG4, and the like, this H.264 requires a more arithmetic volume in encoding processing and decoding processing but can attain a higher encoding rate (for details of the H.264 standard, see ISO/IEC 14496-10 (Mpeg-4 Part 10)).
Such encoding methods compress an information size by reducing temporally redundant information. The temporally redundant information can be reduced by performing detection of motion information for respective blocks and generation of a prediction image with reference to temporally old and future information between image pictures, calculating a differential value between the obtained prediction image and the current frame image, and encoding this differential value.
Note that “picture” is a term representing one screen, and means a frame image in a progressive image, and a frame or field image in an interlaced image.
FIG. 10 shows the types of pictures and their reference relationship in H.264. Referring to FIG. 10, symbols I, P, and B respectively represent the types of pictures, i.e., I (intra encoding) picture, P (forward prediction encoding) picture, and B (bidirectional prediction encoding) picture. Numerals that follow the symbols indicate the numbers of pictures. In this case, a smaller picture number indicates data which is to be played back earlier in terms of time. For example, in FIG. 10, P5 picture is to be played back after B1 picture. P5 picture is tied with B1 picture via an arrow. Such arrow represents the reference relationship between pictures. Therefore, P5 picture refers to B1 picture, and the difference between P5 and B1 pictures is encoded for P5 picture.
In FIG. 10, I2 and I17 are I pictures. Such I pictures are encoded being restricted within each picture, and do not refer to other pictures. P pictures such as P5, P8, P11, and P14 pictures refer to only pictures which exist before the picture of interest in terms of time, and the differences between the pictures are encoded. Furthermore, B0, B1, B3, B4, B6, B7, B9, B10, B12, B13, B15, and B16 are B pictures. Each of such B pictures refers to two pictures irrespective of the temporal relationship, and the difference between the pictures is encoded. Note that P and B pictures may include blocks which do not refer to other pictures and are encoded within pictures.
In H.264, a reference destination is designated for each block as a small region included in each picture, and such blocks in each picture can refer to blocks in different pictures. FIG. 11 shows this reference relationship. Referring to FIG. 11, P5 picture as P picture includes P5 (a) block and P5(b) block. P5(a) block refers to B1(a) block in B1 picture, while P5(b) block refers to I2(b) block in I2 picture.
In H.264, I, P, and B can be designated for respective slices as units smaller than pictures. However, for the sake of simplicity, the following explanation will be given under the assumption that one picture includes the same type (I, P, B) of slices.
In order to decode data encoded using an inter-picture difference, a picture to be referred to must already be decoded. A case will be examined below wherein only I picture and P picture of encoded data having the reference relationship shown in FIG. 11 are extracted to make search playback. Upon decoding I2 picture, I2 picture can be decoded without problems since it is internally encoded. Next, as for P5 picture, since P5(b) block in P5 picture refers to I2(b) block of I2 picture already decoded, it can be decoded. However, since P5(a) block refers to B1(a) block of B1 picture which is not decoded yet, it cannot be decoded intact. In this way, upon decoding only I picture and P picture, only some blocks of P picture and I picture can be decoded.
Also, I picture alone can be extracted and decoded upon playback. However, one I picture is included per, e.g., 15 pictures even when the same picture configuration as in MPEG2 is adopted. Therefore, in playback that extracts only I pictures, a low-speed (e.g., triple speed) search cannot be conducted.
Even in a case other than the search, when playback is started from the middle (e.g., B picture) of an image stream, a picture to be referred to often may not be decoded. In such case, decoding must be done by returning or advancing the control to I picture (IDR picture) as a reference, and it is difficult to immediately decode and play back an image.