1. Field of the Invention
The present invention relates to a reproduction apparatus and a reproduction program that allow a video signal that has been compression-encoded based on an interframe-compression encoding method with a prediction code to be successively reproduced between clips.
2. Description of the Related Art
A data record and reproduction apparatus that records a digital video signal and a digital audio signal to a record medium and reproduces these signals therefrom is known. As a record medium on which a digital video signal and a digital audio signal are recorded, a record medium such as a magnetic tape that is serially accessed has been widely used. In recent years, record mediums such as an optical disc, a hard disk, and a semiconductor memory that are randomly accessible have been widely used for recording and reproducing a digital video signal and a digital audio signal.
Since the data capacity of a digital video signal is huge, it is normally compression-encoded according to a predetermined system and then recorded on a record medium. In recent years, the MPEG2 (Moving Picture Experts Group 2) system has been known as a typical compression-encoding system. In the MPEG2, a digital video signal is compression-encoded based on the DCT (Discrete Cosine Transform) and motion compensation. The compression rate of data is improved with a variable length code.
Next, the structure of a data stream of the MPEG2 will be described in brief. The MPG2 is a combination of both predictive encoding with motion compensation and compression encoding based on the DCT. Data of the MPEG2 are hierarchically structured as a block layer, a macro block layer, a slice layer, a picture layer, a GOP layer, and a sequence in the order from the bottom layer to the top layer. The block layer is composed of DCT blocks each of which is a unit of a DCT process. The macro block layer is composed of a plurality of DCT blocks. The slice layer is composed of a header portion and at least one macro block. The picture layer is composed of a header portion and at least one slice. One picture corresponds to one screen.
The GOP layer is composed of a header portion, an I (Intra-coded) picture based on intra-frame encoding, a P (Predictive-coded) picture based on predictive encoding, and a B (Bi-directionally predictive coded) picture. An I picture can be decoded only with its own information. A P picture necessitates an earlier picture than the current P picture as a reference picture. A B picture necessitates an earlier picture and a later picture than the current picture as reference pictures. Both a P picture and a B pictures are not decoded with themselves. For example, a P picture is decoded with an I picture or a P picture earlier than the current P picture as a reference picture. A B picture is decoded with two pictures of an I picture or a P picture earlier and later than the current picture as reference pictures. A group that includes at least one I picture and that is complete with itself is referred to as a GOP (Group Of Picture). A GOP is a minimum accessible unit in an MPEG stream.
A GOP is composed of one or a plurality of pictures. In the following description, for convenience, a GOP composed of only one I picture is referred to as a single GOP. In contrast, a GOP composed of a plurality of an I picture, a P picture and/or a B picture is referred to as a long GOP. Since a single GOP is composed of only one I picture, data can be easily edited in the unit of a frame. In addition, since inter-frame predictive encoding is not performed for a single GOP, data can be decoded with higher picture quality than a long GOP. In contrast, since inter-frame predictive encoding is performed for a long GOP, data can be encoded with high compression efficiency.
There are two types of long GOPs, a closed GOP that has a closed structure of which the GOP can be completely decoded with itself and an open GOP that can be decoded with information about a GOP earlier by one GOP than the current GOP in the decoding order. Since an open GOP is decoded with more information than a closed GOP, the picture quality of the former is higher than that of the latter. Thus, an open GOP is generally used. In the following description, a “GOP” refers to an open GOP unless otherwise specified.
As a format of a video signal, the SD (Standard Definition) format having a bit rate of 25 Mbps (mega bits per second) is known. Especially, in video devices for broadcasting stations, editing environment of high picture quality and high accuracy has been accomplished with a video signal in the SD format and single GOP. A video signal in the SD format has a fixed bit rate of which the bit rate per frame is fixed.
On the other hand, in recent years, as digital high-vision broadcasts and so forth have been started, the HD (High Definition) format that has a higher resolution than the SD format has been used. In the HD format, as the resolution of a video signal increases, the bit rate thereof becomes higher than that of the SD format. Thus, with a single GOP, it is difficult to record a video signal on a record medium for a long time. To solve this problem, a video signal in the HD format is recorded with a long GOP. In a long GOP, inter-frame compression with a predictive code is performed. Thus, a video signal with a long GOP has variable bit rates that vary frame by frame.
Next, with reference to FIG. 1A to FIG. 1C, a decode process for a long GOP will be described. In this example, it is assumed that one GOP is composed of a total of 15 pictures that are one I picture, four P pictures, and 10 B pictures. As exemplified in FIG. 1A, the display order of I, P, and B pictures of the GOP is “B0B1I2B3B4P5B6B7P8B9B10P11B12B13P14” where subscripts represent display order numbers.
In this example, the first two pictures, B0 picture and B1 picture, are pictures that have been predicted decoded with the last picture, P14 picture, of a GOP earlier by one GOP than the current GOP and I2 picture of the current GOP. The first P picture, P5 picture, of the current GOP is a picture predicated and decoded with I2 picture. The other P pictures, picture P8, picture P11, and picture P14, are pictures predicted and decoded with a P picture earlier by one picture than the current picture. B pictures later than an I picture are pictures predicted and decoded with an I picture or a B picture earlier by one picture than the current picture and an I picture or a B picture later by one picture than the current picture.
In addition, since a B picture is predicted and decoded with an I picture or a P picture earlier by one picture than the current picture and an I picture or a P picture later by one picture than the current picture, it is necessary to decide the order of I, P, and B pictures in a stream or on a record medium taking account of the decoding order of these picture in a decoder. In other words, it is necessary to have decoded I picture(s) and/or P picture(s) with which a B picture is decoded before the it is decoded.
In the foregoing example, as exemplified in FIG. 1B, individual pictures are arranged in a stream or on a record medium as “I2B0B1P5B3B4P8B6B7P11B9B10P14B12B13”. In this order, the pictures are input to the decoder. In this example, subscripts represent display order numbers corresponding to those shown in FIG. 1A.
As shown in FIG. 1C, in the decode process of the decoder, first of all, I2 picture is decoded. B0 picture and B1 picture are predicted and decoded with I2 picture that has been decoded and P14 picture of a GOP earlier by one GOP than the current GOP (in the display order). B0 picture and B1 picture are output from the decoder in their decoded order. Thereafter, I2 picture is output. When B1 picture is output, picture P5 is predicted and decoded with I2 picture. Thereafter, B3 picture and B4 picture are predicted and decoded with I2 picture and P5 picture. B3 picture and B4 picture that have been decoded are output from the decoder in their decoded order. Thereafter, P5 picture is output.
Thereafter, in the same manner, P picture(s) and/or I picture(s) that are used to predict a B picture are decoded before the B picture is decoded. The B picture is decoded with the decoded P picture(s) and/or I picture(s). The process of outputting a decoded B picture and then outputting P picture(s) and/or I picture(s) that have been used to decode the B picture is repeated. The picture arrangement as shown in FIG. 1B on a record medium or in a stream is commonly used. To decode these pictures, a frame memory having a storage for four frames will be used. A method of decoding an MPEG2 elementary stream is described in “Key Point Explanation, Latest MPEG Textbook (translated title)”, Hiroshi Fujiwara, First Edition, ASCII Company, Aug. 1, 1994, p. 106 (hereinafter this document may be referred to as non-patent document 1).
The 1× speed reproduction operation in the forward direction for a video signal using a long GOP can be performed by a decoder that can obtain a decoded result of a picture of one frame in a time period for one frame (hereinafter, this decoder is referred to as the 1× speed decoder).