1. Field of the Invention
The present invention relates to a multiplexer and a multiplexing method, which are suitable for generating a transport stream used for multiplexing and transferring a video or audio elementary stream.
2. Description of Related Art
FIG. 1 shows the structures of an ES (Elementary Stream), a PES (Packetized Elementary Stream), and a TS (Transport Stream) As shown in FIG. 1, under MPEG (Moving Picture Expert Group) transmission standards, video elementary data (ES) including encoded image data are packetized into an appropriate size to thereby generate a packetized ES (PES).
FIG. 2 shows the data structure of a PES packet header. As shown in FIG. 2, a PES packet is added with header information called a PES header. The PES header includes a start code prefix that stores a start code, a stream ID (stream_id) that stores a code for identifying each stream, a PES packet length that stores a packet length of the PES packet, and optional field data. In the optional field, the other optional field as well as a PES priority that indicates a priority given to a packet and a stuffing byte for adjusting a packet length is set. In the optional field, 90 kHz-based decoding time and presentation time called a DTS (Decoding Time Stamp) and a PTS (Presentation Time Stamp) are written. The decoding time DTS and Presentation time PTS are time information for synchronous reproduction.
As a MPEG2 system format, there are two system types: a transport stream TS and a program stream PS (Program Stream). The PES is a basic element for the two streams and is in an intermediate state to enable mutual conversion. Regarding a transport stream, as shown in FIG. 1, a PES is divided into a packet having a fixed length of 188 bytes, which is called “transport packet (TS packet) The transport packet is multiplexed with audio data, stream data, or the like into a transport stream. Then, the transport stream is output. The TS packet is composed of a packet header having a fixed length of 4 bytes, an adaptation field having a variable length, and a payload. The packet header (TS header) defines a PID (packet identifier) or various flags. A TS packet type is identified based on the PID.
The transport stream TS is generated by supplying a video ES and an audio ES to a multiplexer, and packetizing and multiplexing the ESs with the multiplexer. The multiplexer should multiplex a transport stream without causing overflow or underflow of an audio buffer and video buffer at a decoder. For example, a data packet transmitting device as disclosed in Japanese Unexamined Patent Application Publication No. 11-341054 (Takasaka) includes a flow controller that divides contents data into TS packets in accordance with each system format and outputs the TS packets and a packetizer that multiplexes the TS packet into a multiplexed TS packet at a predetermined transport bit rate to supply the packet to a multiplexer. The multiplexer multiplexes and outputs the multiplexed TS packet as well as a video TS packet and an audio TS packet. Here, the flow controller can appropriately insert a null packet based on a ratio between the entire output bit rate and a bit rate of individual TS packets to thereby send TS packets at a predetermined bit rate.
Incidentally, the multiplexer is set to perform virtual buffer simulation with a virtual decoder (transport stream system target decoder (T-STD)) based on the MPEG standards (ISO/IEC13818-1) as described below to generate a transport stream conforming to the MPEG transmission standards.
FIG. 3 is a block diagram of a transport stream system target decoder T-STD. The T-STD is a conceptual model for modeling decoding processing executed with a multiplexer upon generating and verifying a transport stream.
Reference numeral 302 denotes a demultiplexer; 311, a video transport buffer [TBvid]; 321, an audio transport buffer [TBaud]; 331, a system transport buffer [TBsys]; 312, a video main buffer [MB]; 313, a video elementary buffer [EB]; 322, an audio main buffer [Baud]; 332, a system main buffer [Bsys]; 314, a video decoder [Dvid]; 323, an audio decoder [Daud]; and 333, a system decoder [Dsys].
The input transport stream TS (301) is divided into video, audio, or system TS packets with the demultiplexer 302, and the packets are input to the video transport buffer 311, the audio transport buffer 321, and the system transport buffer 331. A TS header is removed from the video TS packet input to the video transport buffer 311, and the packet is transferred to the video main buffer 312 at a leak rate (Rxvid) 341. The leak rate 341 is an amount of data to be retrieved from the video transport buffer per unit time.
The packetized video ES data (video PES data) stored in the video main buffer 312 is transferred to the video elementary buffer 313 at the leak rate (Rbx) 342 if there is a free space in the video elementary buffer 313. In the case of transferring the video PES data from the video main buffer 312 to the video elementary buffer 313, all PES headers just preceding the video PES data in the video main buffer 312 are instantly removed. As for video elementary data stored in the video elementary buffer 313, data corresponding to 1 frame are removed from the video elementary buffer 313 instantly at the decoding time (DTS), and the obtained data is decoded with the video decoder 314.
As for the audio data, data are removed from the audio main buffer 322 similar to video data, and the obtained data is decoded with the audio decoder 323. Further, as for the system data, data are removed from the system main buffer 332 at a standardized rate, and the obtained data is decoded with the system decoder 333.
The multiplexer performs T-STD buffer simulation with a transport stream system target decoder under the MPEG transmission standards. This simulation is referred to as “T-STD virtual buffer simulation. The multiplexer selectively outputs video, audio, and system TS packets based on the simulation result to thereby generate a transport stream conforming to the ISO/IEC13818-1 standards. The buffer simulation is generally performed in units of 90 kHz similar to the decoding time DTS and the Presentation time PTS or 27 MHz corresponding to a MPEG2 system clock.
Incidentally, the MPEG-2 system are compatible with various moving picture compressing functions or various image sizes, so these should be classified based on the level and profile to definitely distinguish these from one another and avoid any confusion resulting from mutual transmission. The profile defines space, time, and whether or not there is an SNR scalability, which corresponds to a resolution, a frame rate, and a function of concurrently processing plural images of different image qualities. The level defines a resolution and a frame rate (the number of frames per second).
As an example of the specifications that a desired MPEG2 function can be used in accordance with the application, there is a Main profile Main level (Main Profile@Main Level: MP@ML). It is possible with the MP@ML to process images of 720×480 pixels and 30 frames/sec like current television images and to use a bidirectional predictive system that adopts both of previous and subsequent images of an image that is being compressed, as a target for movement detection. The MP@ML defines a video buffer size or leak rate in a transport stream system target decoder (T-STD) as follows.    TB size=512 (byte)    MB size=10,000 (byte)    EB size=229376 (byte)    Leak rate Rxvid from video transport buffer TBvid to main buffer MB=18,000,000 (bps)    Leak rate Rbx from main buffer MB to elementary buffer EM=15,000,000 (bps)
However, as described later, the leak rate Rbx from the main buffer MB (312) to the elementary buffer EM (313) is divisible by neither 90 kHz nor 27 MHz.15,000,000(bit/s)=1,875,000 (byte/s)1,875,000÷90,000=20.833 . . . (byte: 90 kHz)1,875,000÷27,000,000=0.06944 . . . (byte: 27 MHz)
Hence, an amount of leak from the main buffer 312 to the elementary buffer 313 is calculated from 20.833 . . . ×s90(in 90 kHz), 0.06944 . . . ×s27(in 27 MHz) (s: time interval), but 20.833 . . . , and 0.06944 . . . are both indefinite decimal. Therefore, the calculation result involves an accumulative error even if floating-point calculation is carried out.
Incidentally, Japanese Unexamined Patent Application Publication No. 9-284732 (Miyazawa et al.) describes a technique of correcting an accumulative error involved in integer arithmetic unlike buffer simulation. The MPEG1 Layer 2 audio defines a sampling frequency of 44.1 kHz, so increments (ΔPTS) of decoding time for each frame in units of 90 kHz cannot take an integer value as follows.ΔPTS=1152×90 (kHz)/44.1 (kHz)=2351.0204
If calculation is performed with the value of ΔPTS approximated to 2351 or 2352 to realize integer arithmetic, video data and audio data are out of sync upon compressing/decompressing the data due to an accumulative error. To avoid the accumulative error, the audio/video data generating device of Miyazawa et al. approximates ΔPTS to 2351 and corrects an audio PTS value per second.
As described above, the MP@ML specifications define the leak rate Rbx as 15,000,000 bps. If a rate of TS, that is, a rate of the entire stream is 15,000,000 bps or less, the leak rate Rbx does not exceed 15,000,000 bps. Hence, a conventional multiplexer of Takasaka et al. has only to execute simulation of the elementary buffer 313 for video data without a need to execute simulation of the main buffer 312.
However, in order to deal with a large-capacity medium in future, it is necessary to keep up with 18,000,000 bps that is the maximum TS rate of the MP@ML specifications. At this time, there is a possibility that the leak rate Rbx exceeds 15,000,000 bps, so it is necessary to perform buffer simulation of the main buffer 312. However, in the buffer simulation of the main buffer 312, a value of the leak amount from the main buffer 312 is indefinite decimal, resulting in a problem that an accumulative error is involved even with the floating-point calculation.
In addition, the technique of Miyazawa et al. realizes integer arithmetic of audio PTS by correcting an error of the audio PTS calculated from the integer arithmetic per second. However, it is preferred to perform calculation in an encoding processing on the basis of system clock in consideration of the other processing. The buffer simulation causes an other problem in that an error is too large for an appropriate simulation if correction is executed at as long time interval as 1 second.