Methods that packetize and encode (multiplex) video and audio streams include an MPEG (Moving Picture Experts Group) transport stream technique (hereinafter referred to as the MPEG2-TS technique).
FIG. 1 is an illustration of restriction in a case in which a transmitter 1 uses the MPEG2-TS technique to perform encoding. The transmitter 1 uses the MPEG2-TS technique to encode video and audio streams and transmits the coded streams. In this case, by assuming a virtual receiver 2, the transmitter 1 determines timing with which it packetizes MPEG2-TS video and audio streams so that a virtual decoder 3 in the virtual receiver 2 can decode the MPEG2-TS packets transmitted by the transmitter 1. Here, the virtual receiver 2 includes the virtual decoder 3, which is the T-STD (Transport Stream Standard Target Decoder) defined in, for example, ISO/IEC 13818-1 MPEG2 systems.
FIG. 2 shows an example of the configuration of the virtual decoder 3 in FIG. 1. In other words, the virtual decoder 3 in FIG. 2 is a model of the T-STD defined in the MPEG2 systems.
The standards of MPEG2 systems have restrictions in the case of using the MPEG2-TS technique to encode video streams defined in MPEG standards such as MPEG1 video, MPEG2 video, and MPEG4 AVC, and audio streams defined in MPEG standards such as MPEG1 audio, MPEG2 AAC audio. Specifically, the transmitter 1 encodes video and audio streams so that the coded video and audio can be decoded by the virtual decoder 3. In other words, the transmitter 1 encodes and packetizes video and audio streams so that the obtained video and audio can fall within the restrictions of the model in the virtual decoder 3 in FIG. 2.
An MPEG2-TS transmitted to the virtual receiver 2 (FIG. 1) is supplied to the virtual decoder 3. As shown in FIG. 2, a filter 2 filters the MPEG2-TS supplied to the virtual decoder 3 by packet type.
Specifically, the MPEG2-TS includes a plurality of packets, each packet bearing a PID (packet identification) for identifying the packet. Based on the PIDs borne by the packets included in the MPEG2-TS, the filter 4 supplies video-stream-forming TS packets to a video data decoding section 5 for processing a video stream, supplies audio-stream-forming TS packets to an audio data decoding section 6 for processing an audio stream, and supplies system-related TS packets to a system decoding section 7 for processing system data.
The video data decoding section 5 includes a transport buffer (indicated by TBv in FIG. 2) 11, a multiplex buffer (indicated by MBv in FIG. 2) 12, a base buffer (indicated by EBv in FIG. 2) 13, a video decoder (indicated by Dv in FIG. 2) 14, and an output ordering buffer (indicated by Ov in FIG. 2) 15.
When video-stream-forming TS packets are supplied to the video data decoding section 5 through the filter 4, the video-stream-forming TS packets are stored in the transport buffer 11. The TS packets stored in the transport buffer 11 are supplied to the multiplex buffer 12 with predetermined timing and are smoothed. The smoothed packets are supplied to the base buffer 13. The video decoder 14 extracts video access units of the packets stored in the base buffer 13 with predetermined timing, decodes the video access units, and outputs the decoded video access units. Part of the decoded data is output from a terminal 56 through the output ordering buffer 15, and the other part of the decoded data is output from a terminal 17 and is played back.
The audio data decoding section 6 includes a transport buffer (indicated by TBn in FIG. 2) 18, a base buffer (indicated by Bn in FIG. 2) 19, and an audio decoder (indicated by Dn in FIG. 2) 20.
When audio-stream-forming TS packets are supplied to the audio data decoding section 6 through the filter 4, the audio-stream-forming TS packets are stored in the transport buffer 18. The size (capacity) of the transport buffer 18 is 512 bytes. The size of the base buffer 19 differs depending on an audio encoding type such as MPEG1 audio or MPEG2 AAC audio. In the audio data decoding section 6, Rxn represents a leak rate from the transport buffer 18. When the transport buffer 18 stores data, the data from the transport buffer 18 is supplied at the rate (speed) of Rxn. When the transport buffer 18 stores no data, no data is supplied from the transport buffer 18 to the base buffer 19 (i.e., Rxn=0).
The audio decoder 20 extracts audio access units stored in the base buffer 19 with predetermined timing, decodes the audio access units, and outputs the decoded audio access units for playback. Specifically, when a presentation time stamp (PTS) of an audio access unit is equal to the time of a system time clock of the T-STD, the audio decoder 20 extracts the audio access unit from the base buffer 19. Audio access units are encoding units forming an audio stream, and are also used as decoding units.
The system decoding section 7 includes a transport buffer (indicated by TBsys in FIG. 2) 22, a base buffer (indicated by Bsys in FIG. 2) 23, and a system decoder (indicated by Dsys in FIG. 2) 24.
When system-related TS packets are supplied to the system decoding section 7 through filter 4, the system-related TS packets are stored as data@ in the transport buffer 22. The data stored in the transport buffer 22 is supplied to the elementary buffer 23. The system decoder 24 extracts system access units stored in the elementary buffer 23 with predetermined timing, decodes the system access units, and output the decoded system access units through a terminal 25.
The transmitter 1 in FIG. 1 needs to perform packetizing video and audio streams, determining transmitting timing, and encoding so that transmitted data can be correctly decoded by the virtual receiver 2 including the virtual decoder 3.
In other words, the transmitter 1 needs@ to perform determining timing for packetizing the audio stream and encoding so that, in terms of an audio decoder model in the virtual decoder 3 (T-STD) in FIG. 2, the transport buffer 18 does not overflow and the elementary buffer 19 does not overflow and underflow.
Regarding receivers (playback apparatuses), a receiver that processes and plays back a base stream and an extension stream having extensibility for the base stream, as shown in Patent Document 1, has been proposed.
An MPEG-2 (Moving Picture Experts Group) audio stream has backward compatibility so that it can be played back by even an MPEG-1 audio decoder. In other words, the MPEG2 audio stream has a structure including an MPEG-1 audio stream portion as a base portion and an MPEG-2 audio portion as an extension portion thereof.
In the DVD (Digital Versatile Disc) video format, a technology that multiplexes an MPEG2 audio stream to generate a program stream is disclosed (e.g., Non-Patent Document 1). FIG. 3 is an illustration of the structure of a program stream in the DVD video format. The program stream 30 in FIG. 3 includes a video pack 31, an MPEG2 audio pack 32, and a plurality of packs 33-1 to 33-j (j represents an arbitrary natural number).
The MPEG2 audio pack 32 includes a pack header 34, a PES (Packetized Elementary Stream) packet header 35, an MPEG1 audio data (Base) 36, a PES packet header 37, and an mpeg2 audio data (Extension) 38. In addition, a payload of the MPEG2 audio pack 32 includes an MPEG1 audio PES packet including the PES packet header 35 and the audio data 36, and an MPEG2 audio extension PES packet including the PES packet header 37 and MPEG2 audio data 38.
When the MPEG2 audio pack 32 is played back, a playback apparatus (playback apparatus only for MPEG1) that can decode only an MPEG1 audio stream separately plays back only the PES packet header 35 and MPEG1 audio data 36 as an MPEG1 audio stream portion. A playback apparatus (playback apparatus having a capability of playback up to an extension audio stream) that can perform playback up to the MPEG2 audio stream separately plays back both base and extension audio streams. Specifically, the latter playback apparatus plays back, in addition to the PES packet header 35 and the MPEG1 audio data 36, the PES packet header 37 and MPEG2 audio data 38 as an MPEG2 audio stream.
[Patent Document 1] Japanese Unexamined Patent Application Publication No. 11-31362
[Non-Patent Document 1] DVD Specifications for Read-Only Disc Part 3; Version 1.1