With recent developments in multimedia technology, various devices for integrally handling multiple media such as digitized video, audio, and data, typified by a DVD player and a set top box for receiving digital TV broadcast, are becoming widespread.
Since the digitized video data or audio data have an enormous amount of codes, an efficient compressive coding technology for digital data is absolutely necessary for efficient recording and transmission. Further, in order to apply the compressive coding technology to practical devices, a multimedia data multiplexing technology for integrating the compressively coded video data, audio data, and additional information data into a single data stream is also required various kinds of technologies for efficient compressive coding and multimedia data multiplexing have already been put to practical use. For example, as an efficient compressive coding technology for audio data, the AC-3 method of Dolby Laboratories Licensing Corp. is widely used. On the other hand, as an efficient compressive coding technology for video data and a multimedia data multiplexing technology, the MPEG standardized by International Standards Organization (ISO) is widely used. These method and standard are also employed in the DVD standard and, especially, a program stream, that is one of multiplexing methods defined by the MPEG standard, is employed as a data stream.
The DVD-Video Recording, which is one of the DVD standards and has most recently been standardized, defines editing of a program stream by an end user using a DVD-RAM disk or the like, and provides a new tool called an entry point. The entry point is defined by time. By defining an entry point, the user can start data reproduction from an arbitrary point (time). Therefore, the entry point can be interpreted as a reproduction start time. Hereinafter, a description will be given of a method for reproducing compressively coded data, when the reproduction is started from an entry point.
First of all, a data structure of a program stream defined by the MPEG standard will be described with reference to FIG. 4.
In FIG. 4, a program stream 301 is composed of a series of packs 302, and each pack 302 is composed of a pack header 303, a system header 304, and at least one packet 305.
The pack header 303 starts with a pack start code 307 (0x000001BA, where 0x indicates hexadecimal notation), and parameter data 308 of the pack, such as a reference clock value called SCR (System Clock Reference) and the like, are described just after the pack start code 307.
The system header 304 starts with a system header start code 309 (0x000001BB), and parameter data 310 of the entire program stream, such as the bit rate, the number of audio channels, the number of video channels, and the like, are described just after the system header start code 309.
The packet 305 starts with a packet start code 311, and parameter data 312 of the packet, such as a reproduction time called PTS (Presentation Time Stamp) and the like, are described just after the packet start code 311, and compressively coded data of video or audio, called an elementary stream 313, is described just after the parameter data 312. The parameter data 312 is information to be used when the elementary stream 313 is decoded.
The packet start code 311 is composed of a packet start prefix of three bytes (0x000001) and a stream ID of one byte. The stream ID denotes the type of the compressively coded data included in the packet. For example, 0xEx (the last x indicates an arbitrary value) denotes a video packet, and 0xDx denotes an audio packet.
Next, a data structure of a video elementary stream 401 compressively coded according to the MPEG standard, which is one of the compressively coded data described in the above-mentioned packets, will be described with reference to FIG. 5.
As shown in FIG. 5, the video elementary stream 401 has a hierarchical structure comprising six layers as follows: a sequence layer 402, a group of picture (hereinafter referred to as GOP) layer 403, a picture layer 404, a slice layer 405, a macroblock layer 406, and a block layer 407.
One sequence starts with a sequence header 408, followed by a series of GOPs 409, and ends with a sequence end 410. The sequence header 408 may be placed, not only at the head of the sequence, but also in an arbitrary position between adjacent GOPs as necessary.
The GOP 409 starts with a GOP header 411, and at least one picture 412 is described after the GOP header 411. The picture 412 is one piece of video frame to be displayed on the screen, and there are three kinds of pictures, I picture, P picture, and B picture. The I picture is short for an intra-frame coded picture that is obtained by compressive coding using only data of its own video frame. The P picture is short for a forward predictive coded picture that is obtained by compressive coding with reference to a video frame (I picture or P picture) in the past. The B picture is short for a bi-directional predictive coded picture that is obtained by compressive coding with reference to two video frames (I picture or P picture), one in the past and one in the future. It is defined that, in order to keep the independence of the GOP 409, the picture 412 just after the GOP header 411 must be an I picture.
Each of the sequence header 408 and the GOP header 411 starts with a start code and, as described above, each start code starts with a start code prefix “0x000001” (first three bytes), followed by the type of data (last one byte). The start code of the sequence header 408 is called a sequence start code (0x000001B3), and the start code of the GOP header 411 is called a group start code (0x000001B8).
The picture 412 starts with a picture header 413, followed by a slice layer 405, a macroblock layer 406, and a block layer 407. The picture header 413 starts with a picture start code 415 (0x00000100, where 0x indicates hexadecimal notation), and the picture start code 415 is followed by parameter data 416 of the picture, such as a number according to the display order of the picture, that is called a temporal reference, and the like. One slice is composed of a series of macroblocks starting from the upper left corner of the video frame, and one macroblock is composed of six blocks that are a fundamental processing unit.
By the way, in the DVD-Video Recording standard, as shown in FIG. 6(b), there is newly introduced a logical unit, that is, a VOBU 502 comprising a series of packs 503, 504, 505, . . . of video, audio, and the like. One VOBU 502 is defined as a minimum unit that assures synchronous reproduction of video and audio within a period of 0.4˜1.0 sec. With reference to FIG. 6(d), the compressively coded video data in the VOBU 502 starts with a sequence header 506, and at least one GOP 507 is described after the sequence header 506. In some instances, a sequence end is described at the end of the VOBU 502. In the sequence header 506, parameter data common through the entire program, such as the video frame size, the aspect ratio, the frame rate, etc., are described.
Next, a description will be given of a method for reproducing compressively coded data, starting from the entry point described above. FIG. 7 is a block diagram illustrating the construction of a conventional apparatus for reproducing compressively coded data. With reference to FIG. 7, the apparatus is provided with a transmitter 610 for transmitting a stream; a system decoder 611 for extracting a required pack from the inputted stream; a video decoder 612 for decoding video data; an audio decoder 613 for decoding audio data; and a synchronous controller 614 for controlling the operation timings of the respective constituents of the apparatus. Hereinafter, a description will be given of the operation of the compressively coded data reproduction apparatus so constructed, when it starts data reproduction from an entry point.
As shown in FIG. 7, a VOBU 615 including an entry point is transmitted from the transmitter 610 to the system decoder 611. The system decoder 611 extracts a video pack and an audio pack from the inputted VOBU 615, and transmits a video elementary stream 616 and an audio elementary stream 617, which are obtained by removing packet start codes and parameter data from the video pack and the audio pack, to the video decoder 612 and the audio decoder 613, respectively. Further, the system decoder 611 transmits a PTS 618 included in the parameter data, to the synchronous controller 614. The video decoder 612 decodes video frames from the inputted video elementary stream 616. The audio decoder 613 decodes audio frames from the inputted audio elementary stream 617. The synchronous controller 614 controls the transmitter 610, the system decoder 611, the video decoder 612, and the audio decoder 613, thereby controlling synchronous output of a video frame 619 and an audio frame 620.
FIG. 8 is a flowchart for explaining the operation to start data reproduction according to the entry point, of the synchronous controller 614 as one of the constituents of the conventional compressively coded data reproduction apparatus. Hereinafter, the operation of the synchronous controller 614 will be described in detail with reference to the flowchart of FIG. 8.
Initially, when the operation is started (step 701), the synchronous controller 614 is notified, from the outside, that an entry point value is set and data reproduction is to be started from the entry point, and outputs a start request to the transmitter 610 and the respective decoders 611, 612, and 613 (step 702). On receipt of this request, the transmitter 610 and the respective decoders 611, 612, and 613 start to operate.
Next, in step 703, the synchronous controller 614 outputs a data supply request to the transmitter 610. On receipt of this request, the transmitter 610 performs data transmission, starting from the head of the VOBU 615 including the entry point. On receipt of the data from the transmitter 610, the system decoder 611 starts the above-described separation and extraction.
In step 704, the video decoder 612 performs decoding of video frames from the video elementary stream supplied from the system decoder 611, until the video frame PTS 618 supplied from the system decoder 611 matches the entry point within a predetermined threshold. In this step, the video decoder 612 performs only decoding, and stores the decoded video frames in a video frame buffer (not shown) in the video decoder 612. That is, the video decoder 612 does not output video data for display yet.
The audio decoder 613 does not perform decoding until it receives an audio frame synchronous output request in step 708. The audio decoder 613 performs only storage of the audio elementary stream 617 supplied from the system decoder 611 in an audio bit buffer (not shown) in the audio decoder 613. In this storage process, the audio decoder 613 also controls overflow of the audio bit buffer. To be specific, when overflow is likely to occur, the audio decoder 613 discards the audio elementary stream 617 already stored in the audio bit buffer, and stores the audio elementary stream 617 that is newly transmitted in the audio bit buffer.
Next, in step 704, when the video frame PTS 618 supplied from the system decoder 611 matches the entry point within a predetermined threshold, the synchronous controller 614 goes to step 705. In step 705, the synchronous controller 614 initializes the synchronous clock with the value of the video frame PTS 618.
Next, in step 706, the synchronous controller 614 outputs a video frame synchronous output request to the video decoder 612. On receipt of this request, the video decoder 612 performs decoding of the video frame whose PTS 618 supplied from the system decoder 611 matches the entry point within the predetermined threshold and, simultaneously, outputs the video frame for display. In this step, output of a video frame for display is performed for the first time and, thereafter, the video decoder 612 performs decoding and output for display, on the subsequent video frames from the video elementary stream supplied from the system decoder 611, under synchronous control by the synchronous controller 614 using the synchronous clock and the video frame PTS 618 supplied from the system decoder 611.
Next, in step 707, the synchronous controller 614 continues monitoring until the audio frame PTS 618 supplied from the system decoder 611 matches the synchronous clock within a predetermined threshold. During the monitoring, the audio decoder 613 continues only the storage of the audio elementary stream 617 in the audio bit buffer.
When the audio frame PTS 618 supplied from the system decoder 611 matches the synchronous clock within the predetermined threshold in step 707, the synchronous controller 614 proceeds to step 708, and outputs an audio frame synchronous output request to the audio decoder 613.
On receipt of this request, the audio decoder 613 performs decoding of the audio frame whose PTS 618 supplied from the system decoder 611 matches the synchronous clock within the predetermined threshold and, simultaneously, performs audio output. In this step, output of an audio frame is performed for the first time and, thereafter, the audio decoder 613 performs decoding and audio output on the subsequent audio frames from the audio elementary stream supplied from the system decoder 611, under synchronous control by the synchronous controller 614 using the synchronous clock and the audio frame PTS 618 supplied from the system decoder 611.
In the conventional method for reproducing compressively coded data, however, since the output of audio frames is started in step 708 after the output of video frames for display is started in step 706, it is apparent that the output of audio frames lags behind the output of video frames for display.
Further, in the above-described method, when no coded video data exists in the program stream, there will not occur matching of condition that, in step 704, the video frame PTS 618 supplied from the system decoder 611 matches the entry point within the predetermined threshold, and therefore, the synchronous controller 614 cannot proceed to the following step 705 and on. In this case, even when a coded audio frame corresponding to the entry point exists in the data stream, the audio decoder 613 cannot start output of audio frames.
Furthermore, in the above-described method, when the video frame PTS 618 is not assigned to each video frame in the program stream, in order to make a matching of condition that the video frame PTS 618 supplied from the system decoder 611 matches the entry point within the predetermined threshold, the threshold should be sufficiently large. To be specific, although it is defined in the DVD-Video Recording standard that a video frame PTS should be assigned to each I picture, there is no such definition for other pictures. Further, there is no special definition on I pictures except that an I picture should be placed at the head of a VOBU. However, in many instances, an I picture exists only at the head of a VOBU in an actual program stream, with regard to the efficiency of compressive coding, and a video frame PTS is assigned to only the I picture at the head of the VOBU. Taking it in consideration, a threshold equivalent to one VOBU must be set, whereby the unit of synchronous control becomes, not a video frame, but a VOBU.