In recent years, with the advance of an information-oriented society, transmission of moving pictures without the barriers of time and distance has been increasingly demanded. The time has come when digital techniques are made practicable in earnest, and it is possible to record and reproduce moving pictures in a recording device and to transmit the moving pictures over a long distance via a communication network In addition, not only in a communication field but in a broadcasting field, transmission using digital techniques and adoption of a coding system have been realized.
Generally, moving pictures and audio signals of digital signals are large in amount of coding. Therefore, in order to record and transmit these moving pictures and audio signals with good efficiency, it Is necessary to utilize high-efficiency coding techniques, and various coding apparatus and decoding apparatus have been already manufactured by way of trial.
As examples of application, there are a video CD (compact disk) in which digital moving pictures are recorded in a CD, and a DVD which records digital moving pictures at higher image quality and for a longer time than those of the video CD.
For a decoding apparatus that reproduces these video CD and DVD, trick play, such as fast forward play and fast reverse play, is indispensable. In order to realize the trick play, there is a method corresponding to one that is described in the international standard, called MPEG (moving picture experts group), "Information Technology--Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbits/s" (ISO/IEC11172-2). A description will be given of a conventional method for decoding digital moving pictures using the above-described method.
First, a method for coding digital moving pictures and a bitstream in the MPEG are described.
In the MPEG, assuming that digital moving pictures comprise a series of video frames 700 as shown in FIG. 5(a), a video frame group 500, which is called a sequence, is coded. This sequence is usually divided into a series of video frame groups 600 at about 0.5 second each, which are called GOP (group of pictures), to be coded
The GOP comprises I picture, P pictures and B pictures, for example, as schematically shown in FIG. 5(b). The I picture is an Intra-Picture that is coded with only data in the video frame, the P picture is a Predictive Picture that is predicted and coded from data of the I picture or P picture of two frames before, and the B picture is a Bidirectionally Predictive-Picture that is predicted and coded from data of the I and P pictures or the P pictures before and after the B picture.
FIG. 6(a) shows a structure of each picture. The picture comprises a continuous region or more having a belt shape on a picture plane, which is called a slice. The slice comprises a block 800 or more of 16 lines long and 16 pixels broad, which is called a macroblock.
As shown in FIGS. 6(b) and 6(c), the macroblock comprises a plurality of blocks of 8 lines long and 8 pixels broad, which are called blocks. For example, the macroblock comprises blocks of four luminance signals and respective chrominance signals of two systems (Cb, Cr).
As described above, the bitstream has a hierarchical structure. The sequence, the GOP, the picture and the slice, i.e., bitstreams constituting higher hierarchies of the hierarchical structure, include start codes for uniquely identifying these bitstreams on the bitstream, respectively. Further, the bitstreams have regions that retain hiearchical information, called headers and extensions, having coded information of the bitstreams constituting the respective hierarchies. For example, as shown in FIG. 7, these data are arranged to constitute a bitstream 1000.
In the hierarchy of the macroblocks and the lower hierarchy, there is information, such as a macroblock address increment that shows how many macroblocks each macroblock is away from the macroblock which was coded last time by, a macroblock type that represents predictive mode information of the macroblock, which was selected at coding, a quantizer scale that represents a quantization step, a motion vector that is used for motion compensation, a coded block pattern that shows which blocks are coded and are present in the bitstream, and coded DCT (discrete cosine transform) coefficient information.
At this time, variable length codes are used in coding the information of the hierarchy constituted by the macroblocks and the lower hierarchy. By allocating a shorter code to the information appearing more frequently, the information of the hierarchy constituted by the macroblocks and the lower hierarchy, which occupies the large portion of the bitstream, is coded efficiently.
In reality, as shown in FIG. 8, the bitstream is divided into packets 40 of appropriate lengths. Similarly, audio signals that are coded separately from the moving pictures are divided into packets. In addition, a packet header 41 that retains identification information is added to the head of each packet 40, for identifying the information of the packet These packets are multiplexed to form the bitstream 1000. The information in each packet 40 that has been originally included in the bitstream 1000 is called a payload region 42.
A description is given of a fast forward play method and a fast reverse play method of the bitstream.
In a case of normal play, the bitstream is all reproduced and all the pictures are decoded to be displayed. In a case of fast forward play, however, all the bitstream is transmitted to a decoding apparatus, and the I pictures are selected from the bitstream and decoded in the decoding apparatus to be displayed. Alternatively, only the I pictures are selectively transmitted to the decoding apparatus, and these pictures are decoded in the decoding apparatus to be displayed.
In the decoding apparatus, using a code table that shows relationships between codes and symbols, such as picture data, corresponding to the codes, decoding is performed while collating bitstreams that are successively input with the code table. When the decoding apparatus finds a symbol that is not defined in the code table, decoding of a picture with the symbol is stopped immediately, and the picture is subjected to error concealment processing as error processing.
As an example of error concealment processing, concerning a picture that has not been decoded by stopping, a content of the previous picture that has been decoded is copied to a portion of the picture that is not accomplished, thereby accomplishing a video frame, and the video frame is displayed in place of the picture that has not been decoded by stopping. Then, processing of skipping to a header of a next picture is performed.
As another example of error concealment processing, a picture that has not been decoded by stopping is not displayed at all, and processing of skipping to a header of a next picture is only performed.
There are a case in which one of these error concealment processings is always performed, and a case in which the processing is adaptably switched. Similarly, when the presence of the header and extension does not satisfy a specified syntactic rule, an error is considered to occur, and the same processing as described above is performed.
However, when practically using the above-described method for fast forward play, the capability to parse the bitstream is insufficient, and the selection of the I pictures is complicated. Therefore, without transmitting all bitstream fragments including all the bitstream 1000 to the decoding apparatus as shown in FIG. 9(a), only bitstream fragments 300, 310, 320, . . . comprising the packets 40, each fragment being considered to include a specified bitstream, such as the I picture, are selectively transmitted to the decoding apparatus (FIG. 9(b)), the payload regions 41 of the packets constituting these bitstream fragments are arranged again to form an elementary bitstream 1001 (FIG. 9(c)), thereby successively reproducing only the I pictures along a time series. This reproduction is perfromed by decoding the bitstreams to data for reproduction while successively collating the codes constituting the elementary bitstream 1001 with a code table. The code table previously defines relationships between the codes used for coding and the data for reproduction. In addition to the I pictures, the P pictures of the elementary bitstream 1001 may be reproduced.
Fast reverse play is realized by transmitting the bitstream fragments at intervals in reverse of the fast forward play, i.e., while going back the time. That is, the bitstream fragments are transmitted to the decoding apparatus in the order of 320, 310, 300, . . . .
At this time, the packets including the heads of the I pictures in the GOP are often Identified by using management information that is separately recorded on a disk or the like. Further, the bitstream is often divided into packets so that the heads of the I pictures are the heads of the packets.
In the construction described above, in performing fast forward play and fast reverse play, the bitstream fragments comprising the packets at intervals are transmitted to the decoding apparatus. Therefore, as shown in FIG. 9(d) that is obtained by enlarging a portion 80 shown in FIG. 9(c), when the bitstream fragment does not include the bitstream of the picture to the end, in a region before and after the connection point A with the following bitstream fragment, the code is different from a code to be decoded. Usually, the connection point A is detected as an error during decoding to stop decoding the picture that has been decoded till this time, and error processing is performed to the picture. Then, decoding of the following picture is resumed. However, there is a case where the code in the region before and after the connection point A is syntactically correct, i.e., the code can be decoded according to a code table. In this case, no error is detected at the connection point A and it is considered that the code in the region before and after the connection point A is normal, so that decoding of the code is performed. Sequentially, the codes of the following pictures may be decoded as codes different from the original codes. Accordingly, without normally decoding the following pictures, pictures different from the original pictures may be decoded, resulting in excessive disturbance of the pictures.
Further, in this case, error detection is not always performed at the region before and after the connection point A, and the elementary bitstream is parsed and decoded as far as the part at which a syntactic error clearly occurs. Therefore, the decoding apparatus takes wasteful time, so that time required for the following normal decoding operation cannot be reserved.