1. Field of the Invention
The present invention relates to a method and an apparatus for decoding a digital signal and an apparatus for reproducing a digital signal with which video signals and audio signals can satisfactorily be recorded on a random-access recording medium, such as a magneto optical disc, and the recorded signals can satisfactorily be reproduced from the recording medium and displayed on a display unit or the like.
2. Description of Prior Art
FIG. 1 shows the structure of a conventional system for decoding coded data, which has been recorded on a random-access storage medium and for displaying decoded data.
Referring to FIG. 1, a storage medium 101 is an optical disc or the like which permits a random access. Coded data, which has been recorded on the storage medium 101, is read by a reading unit 102 and temporarily stored in a track buffer 103, if necessary. Data read from the storage medium 101 and stored in the track buffer 103 is usually in the form in which coded video and audio data is, as one data string, time-division-multiplexed.
Coded video and audio data, which have been time-division-multiplexed, are read from the track buffer 103 as described above, and then separated into coded video data and coded audio data by a separator 104. Thus-separated coded video data and audio data are supplied to corresponding decoding buffers 105V and 105A. Coded data items, stored in the decoding buffers 105V and 105A, are read at a predetermined timing, and then transmitted to corresponding image decoder 106V and audio decoder 106A so as to be decoded. A video signal, obtained by the decoding operation performed by the image decoder 106V, is supplied to an image display unit 107V so as to be displayed. On the other hand, an audio signal obtained by the decoding operation performed by the audio decoder 106A is, as sound, reproduced from a sound generator 107A, for example, a loud speaker unit.
As described above, a system of the foregoing type arranged in such a manner that video and audio signals are recorded on a recording medium, such as an optical disc or a magnetic tape, and the recorded signals are reproduced to be displayed on a display unit, or a system, such as a video conference system or a picture phone system, in which video and audio signals are transmitted from a transmission side through a predetermined passage to be displayed on the receiving side, has recently and usually been arranged in such a manner that the video and audio signals are A/D (analog/digital)-converted and then coded by a so-called MPEG (Moving Picture Experts Group) method so that the quantity of information is reduced by compression.
The MPEG is an abbreviation of an investigating organization for coding dynamic images for storage, which is included in ISO/IEC JTC1/SC29 (International Organization for Standardization/International Electrotechnical Commission, Joint Technical Committee 1/Sub Committee 29). MPEG1 includes ISO 11172, while MPEG2 includes ISO 13818. In the above-mentioned international standards, ISO 11171-1 and ISO 13818-1 have been standardized in the category of system multiplication, ISO 11172-2 and ISO 13818-2 have been standardized in the category of image coding, and ISO 11172-3 and ISO 13818-2 have been standardized in the category of audio coding.
The MPEG has three coding types including I picture, B picture and P picture in order to efficiently code an image and realize a random access. The "picture" is formed by coding a picture (a frame or a field) forming a dynamic image.
The I picture has a form in which coding has been completed in the picture, the I picture being coded individually from other pictures. Therefore, the I picture is used in an random access operation and to serve as an entry point for recovering an error. If the frequency of the I pictures is raised, the efficiency in coding information deteriorates.
The P picture is a mode in which predictive coding in a forward direction is performed, the P picture being predicted from a previous I picture or P picture. Therefore, the P picture can be decoded only when the previous I picture or the P picture have been decoded. By using the P picture, the coding efficiency can be improved as compared with a process in which coding is performed by using only the I picture.
The B picture, which is development of the P picture, is a mode in which predictive coding in the two directions is performed. Previous and future I pictures or P pictures are used to perform prediction in the forward direction, or the backward direction or two directions. Therefore, the B picture can be decoded only when the forward and backward I pictures or P pictures have been decoded. By using the B picture, the coding efficiency can significantly be improved.
In general, random access is enabled and a satisfactory coding efficiency can be realized by combining I, B and P pictures.
FIG. 2A shows an example of the combination. FIG. 2A shows the above-mentioned pictures arranged in the displaying order. An arrow d shown in FIG. 2A indicates a direction in which prediction is performed. When B picture is attempted to be decoded and displayed, the I pictures or the P pictures to be displayed at positions more forwards than or following (in terms of time) the B picture must be decoded (decoding must be performed prior to decoding the B picture). Specifically, when the image of, for example, picture B5 is decoded to realize the displaying order shown in FIG. 2A, at least pictures I0, P2, P4 and P6 must be decoded previously. That is, the picture P2 is predicted from the picture I0, the picture P4 is predicted from the picture P2, the picture P6 is predicted from the picture P4 and the picture B5 is predicted from the pictures P4 and P6. Therefore, the pictures I0, P2, P4 and P6 must previously be decoded to decode the picture B5. Accordingly, the pictures have been rearranged in an order as I0, P2, B1, P4, B3, . . . , on the coding stream, as shown in FIG. 2B. That is, the coding stream having the order of pictures as shown in FIG. 2B has been recorded on the recording medium. Therefore, when the recording medium is reproduced and an image is displayed on the display unit, display is performed in such a manner that the pictures are decoded from the coding stream reproduced from the recording medium in the order shown in FIG. 2B after which rearrangement to the order shown in FIG. 2A is performed.
Audio data is coded by compression methods including so-called AC-3 (ATSC standard Doc. A/52, 20 Dec. 1995) as well as the MPEG. The foregoing compression methods are arranged in such a manner that a predetermined number of sampled data items are collectively treated to serve as a coded unit. The decoding process is performed by using the collected data items as the coding units.
In general, the decoding periods of audio frames which are coded units for voice and decoding periods of pictures obtained by coding video data do not coincide with each other. FIG. 3 shows the foregoing decoding periods expressed in the order of time. FIG. 3A shows component units (pictures) of coded data of an image and display start time T.sub.P of each picture. FIG. 3B shows component units (audio frames) of coded data for voice and start time T.sub.A of each audio frame. That is, a fact can be understood from FIGS. 3A and 3B that the decoding periods of the audio frames and those of pictures do not coincide with one another.
Video CD and DVD (Digital Video Disc), standardized and introduced to the market in recent years, are arranged to be adapted to the MPEG method having the above-mentioned structure so that an image is coded, voice is coded by MPEG or AC-3 and then data above is time-division-multiplexed by the standard of the MPEG system so as to be recorded on the disc.
A case will be considered in which video data coded by the MPEG method has been recorded on the storage medium 101 shown in FIG. 1. In consideration of the coding efficiency and random access, an assumption is made that data above has been recorded with the structure of the pictures shown in FIG. 4A.
If the storage medium 101 permits the random access, a stream shown in FIG. 4A can be reproduced in such a manner that, for example, pictures in a region reaching P picture at point S.sub.A shown in FIG. 4A are reproduced, after which reproduction of following pictures is skipped, and then reproduction is restarted at B picture (picture B3) at point S.sub.B shown in FIG. 4A. The omission of reproduction of pictures will hereinafter be called "skipping", while reproduction which is carried out after skipping to a distant picture from a certain picture has been performed will hereinafter be called "skip reproduction". A position at which skipping is commenced is called a "skip start point", while a position at which skipping is completed is called a "skip end point".
As described above, the B picture (picture B3 in the case shown in FIG. 4) can be decoded only when the previous I picture or the P picture (at least pictures I0, P2 and P4 in the case shown in FIG. 4) have been decoded. Therefore, continuous reproduction of images is undesirably interrupted during the process for decoding the foregoing pictures (I0, P2 and P4).
In addition to the case where video signals have been coded by the MPEG method, a case where a predictive coding method is employed in which correlation between images is used to obtain the difference between the images so as to code the difference suffers from discontinuous reproduction at the seam between pictures when the skip reproduction is performed.
As described above, the voice coding methods, such as the MPEG and AC-3, in which a predetermined number of sampled data items are collected to be used as coded units encounters the fact that the decoding periods of the video signals and the decoding periods of audio signals do not necessarily coincide with one another. Therefore, if the continuous reproduction of video signals is given priority when the skip reproduction is performed, blank regions are undesirably generated in which any audio signal is not reproduced as described below. The blank region, in which any audio signal is not reproduced when the skip reproduction is performed, is called an "audio gap".
FIGS. 5B and 5C show coded data of an image and voice recorded on a storage medium in the order in terms of time.
A case will now be considered in which a video signal shown in FIG. 5B is decoded until pictures to a certain picture (the skip start point which is picture V.sub.A in the case shown in FIG. 5D) are decoded and displayed as shown in FIG. 5D, after which reproduction and decoding are restarted at another picture (skip end point which is picture V.sub.C in the case shown in FIG. 5F) shown in FIG. 5F by skip reproduction which is performed at a timing shown in FIG. 5A.
When an audio signal corresponding to skip reproduction of a video signal is decoded, decoding of audio frames to audio frame A.sub.B corresponding to the picture V.sub.A is performed as shown in FIG. 5E. Then, reproduction and decoding are restarted at audio frame A.sub.D, which corresponds to the picture V.sub.C as shown in FIG. 5G, to correspond to the skip reproduction.
Since an image and voice must be reproduced in synchronization with each other, the phase difference of the audio signal with respect to the video signal, that is, the phase difference between display start time for the picture and start time for the audio frame must be maintained when reproduction is performed regardless of a fact whether or not the skip reproduction is performed.
When the skip reproduction is performed in such a manner that the image is given priority, that is, the picture V.sub.A and the picture V.sub.C more forward than the skip reproduction point and following the same are continuously displayed as shown in FIG. 5H, audio data encounters a time width (audio gap AG) in which no audio data exists as shown in FIG. 5I. As a result, continuous audio reproduction cannot be performed.
Although the description has been made about discontinuity of decoded data in the skip reproduction, a similar problem arises because of the same principle when two decoded data items, which have been individually coded, are reproduced by connecting at an arbitrary point.