1. Field of the Invention
The present invention relates to a playback apparatus, playback method, and playback program that allow playback of pictures with an improved picture quality at a variable playback speed in the forward direction or reverse direction.
2. Description of the Related Art
Data recording and playback apparatuses for recording or playing back digital video signals and digital audio signals onto or from a recording medium exist. As a recording medium for recording digital video signals and digital audio signals, a serial-access recording medium, such as a magnetic tape, has often been used hitherto. Recently, a random access medium, such as an optical disc, a hard disc, or a semiconductor memory, are coming to be used more often for recording and playback of digital audio signals.
Because of their large data volume, digital video signals are usually encoded for compression according to a predetermined scheme before being recorded on a recording medium. Recently, MPEG2 (Moving Picture Experts Group 2) is used as a standard scheme of encoding for compression. According to MPEG2, digital video signals are encoded for compression through DCT (discrete cosine transform) and motion compensation, and the rate of data compression is enhanced using variable-length codes.
Now, an overview of the structure of an MPEG2 data stream will be described. MPEG2 is based on a combination of predictive coding with motion compensation and encoding for compression by DCT. The data structure according to MPEG2 forms a hierarchy including a block layer, a macro-block layer, a slice layer, a picture layer, a GOP layer, and a sequence layer, in that order from the bottom. The block layer is composed of a DCT block as a unit of executing DCT. The macro-block layer is composed of a plurality of DCT blocks. The slice layer is composed of a header and one or more macro blocks. The picture layer is composed of a header and one or more slices. A picture corresponds to one screen.
The GOP layer is composed of a header, an intra-coded picture (I picture), which is a picture based on intra-frame coding, and a predictive coded picture (P picture) and a bi-directionally predictive coded picture (B picture), which are pictures based on predictive coding. The I picture can be decoded with its own information alone. The P and B pictures are not decoded alone, and a preceding picture or preceding and succeeding pictures is used as a reference picture for decoding. For example, the P picture is decoded using a temporally preceding I picture or P picture as a reference picture. The B picture is decoded using two preceding and succeeding I or P pictures as reference pictures. A group that includes at least one I picture and that does not depend on any picture outside the group is referred to as a GOP (group of pictures), which constitutes a minimum unit of independent access in an MPEG stream.
A GOP is composed of one or more pictures. In the following description, a GOP composed of only one I picture is referred to as a single GOP, and a GOP composed of a plurality of pictures including an I picture and P and/or B pictures is referred to as a long GOP. In the case of a single GOP, the GOP is composed of only one I picture, so that editing on a frame-by-frame basis is facilitated. Furthermore, since inter-frame predictive coding is not executed, an improved picture quality can be achieved. In contrast, in the case of a long GOP, inter-frame predictive coding is executed, so that the efficiency of compression is high.
Long GOPs can be classified into two types, namely, a closed GOP having a closed structure so that complete decoding is allowed within the GOP, and an open GOP in which information of an immediately preceding GOP in order of encoding can be used at the time of decoding. As compared with closed GOPs, open GOPs can be decoded using more information to achieve, so that high picture quality can be achieved, and are commonly used. Hereinafter, a “GOP” refers to an open GOP unless otherwise described.
The SD (standard definition) format at a bitrate of 25 Mbps (megabits per second) has been known as a format of video signals. Particularly, in video apparatuses used in broadcasting stations or the like, video signals in the SD format are used in single GOPs described above so that high picture quality and an environment that allows precise editing can be achieved. The video signals in the SD format have a fixed bitrate, i.e., the bitrates of individual frames are the same.
Recently, as technologies such as digital high-definition broadcasting come into practice, the HD (high definition) format having a resolution higher than the resolution of the SD format is coming to be used. The HD format has a higher bitrate in accordance with the high resolution, so that recording for a long period on a recording is not allowed in the case of single GOPs. Thus, video signals in the HD format are used in long GOPs described above. In the case of long GOPs, since inter-frame compression based on predictive coding is executed, the bitrate is variable, i.e., the bitrates vary among individual frames.
When video signals are edited, in order to define editing points, such as IN points and OUT points, searching of individual frames is executed. For this purpose, variable-speed playback within a normal speed in the forward direction and the reverse direction should be allowed. When single GOPs are used as in the case of the SD format, it is possible to decode individual frames individually, so that problems does not particularly arise regarding the variable-speed playback within the normal speed. That is, in the case of single GOPs, it suffices to decode at least frames that are to be displayed.
On the other hand, when long GOPs are used as in the case of the HD format, in contrast to the case of the SD format described above, it is not possible to decode individual frames independently. Now, decoding in the case of a long GOP will be described with reference to FIGS. 16A to 16C. It is assumed herein that a GOP is composed of 15 pictures in total, namely, one I picture, four P pictures, and ten B pictures. The order of display of the I, P, and B pictures in the GOP is, for example, “B0B1I2B3B4P5B6B7P8B9B10P11B12B13P14”, as shown in FIG. 16A. The indices represent orders of display.
In this example, the first two B0 picture and B1 pictures are pictures predicted and decoding using the last P14 picture in the immediately preceding GOP and the I2 picture in the current GOP. The first P5 picture in the current GOP is a picture predicted and decoded using the I2 picture. The other P8 picture, P11 picture, and P14 pictures are pictures predicted and decoded using their respective immediately preceding P pictures. Each of the B pictures subsequent to the I picture is a picture predicted and decoding using preceding and succeeding I and/or P pictures.
Since B pictures are predicted and decoded using temporally preceding and succeeding I or P pictures, the order of I, P, and B pictures in a stream or on a recording medium should be determined in consideration of an order of decoding by a decoder. That is, the order should be such that I and/or P pictures for decoding a B picture are decoded before the B picture.
In the example described above, the order of pictures in a stream or a on a recording medium is “I2B0B1P5B3B4P8B6B7P11B9B10P14B12B13”, as in an example shown in FIG. 16B, and the pictures are input to a decoder in this order. The indices represent orders of display, correspondingly to those in FIG. 16A.
In the decoding by the decoder, as shown in FIG. 16C, the I2 picture is first decoded. Then, the B0 picture and the B1 picture are predicted and decoded using the decoded I2 picture and the last P14 picture (in order of display) in the immediately preceding GOP. The B0 picture and the B1 picture are output from the decoder sequentially in order of decoding, and then the I2 picture is output. When the B1 picture has been output, then, the P5 picture is predicted and decoded using the I2 picture. Then, the B3 picture and the B4 picture are predicted and decoded using the I2 picture and the P5 picture. Then, the B3 picture and the B4 picture are output from the decoder sequentially in order of decoding, and then the P5 picture is output.
Subsequently, P or I pictures used for predicting a B picture are decoded before decoding the B picture, the B picture is predicted and decoded using the decoded P or I pictures, the decoded B picture is output, and then, the P or I pictures used for decoding the B picture are output. This processing is repeated. The arrangement of pictures on a recording medium or in a stream, shown in FIG. 16B, is often used, in which a frame memory having a size corresponding to four frames is used for decoding.
Normal-speed playback in the forward direction using a long GOP for video signals can be achieved using a decoder (normal-speed decoder) that is capable of obtaining results of decoding of a picture of one frame in a time corresponding to one frame.