1. Field of the Invention
The present invention relates to a data multiplexing apparatus and a data multiplexing method for multiplexing a plurality of data streams.
2. Description of Related Art
Moving Picture Expert Group (MPEG) standard and DVD-Video standard are known as the international standards for encoding of video data and audio data and multiplexing of these data. Since the video data and audio data are basically data strings that access sequentially, they are encoded as a video stream and an audio stream, respectively. The video data and audio data are encoded and multiplexed based on the above standards.
FIG. 8 is a block diagram that shows the schematic configuration of a data multiplexing apparatus based on the above standards. FIG. 8 also shows a part of the configuration of a data separating apparatus. A conventional data multiplexing apparatus includes a video encoder 101, an encoding video buffer 102, an audio encoder 103, an encoding audio buffer 104, a multiplexer 105, and a multiplexing controller 106.
The video encoder 101 encodes an input digital video signal and generates a video stream that is a data sequence of video data. The encoding video buffer 102 temporarily stores the video stream.
The audio encoder 103 encodes an input audio PCM signal and generates an audio stream that is a data sequence of audio data. The encoding audio buffer 104 temporarily stores the audio stream.
The multiplexer 105 multiplexes the video stream output from the encoding video buffer 102 and the audio stream output from the encoding audio buffer 104 to create a system stream. The multiplexing controller 106 controls the amount of video and audio streams and determines a timing of multiplexing or the like.
According to the above standards, a video stream and an audio stream are multiplexed in units of packs. The system stream is thus made up of a plurality of video packs and a plurality of audio packs.
To decode the system stream created in the above multiplexing apparatus, the separating apparatus 107 separates the system stream into an audio stream and a video stream and temporarily stores them into a decoding video buffer 108 and a decoding audio buffer 109, for example, thereby decoding them.
Therefore, the multiplexing apparatus that encodes data needs to control the timing of multiplexing so that the decoding video buffer 108 and the decoding audio buffer 109 in the separating apparatus do not overflow or underflow. To implement this control, the multiplexing controller 106 has a virtual decoding video buffer and a virtual decoding audio buffer and calculates their occupancies.
FIG. 9 is a block diagram that shows the detailed configuration of the multiplexing controller 106. The multiplexing controller 106 includes an encoding video stream amount storage 201, a virtual decoding video buffer occupancy calculator 202, a virtual decoding video buffer 203, an encoding audio stream amount storage 204, a virtual decoding audio buffer occupancy calculator 205, a virtual decoding audio buffer 206, a control signal generator 207, and a multiplexing data determinater 208.
The encoding video stream amount storage 201 stores the amount of video streams that are stored in the encoding video buffer 102. The virtual decoding video buffer occupancy calculator 202 virtually calculates the amount of video streams that are accumulated in the decoding video buffer 108 and stores the result into the virtual decoding video buffer 203. The virtual decoding video buffer 203 is a register or the like for storing the calculation results and it stores a virtual value of the occupancy amount of the decoding video buffer 108 in the decoder.
The encoding audio stream amount storage 204 stores the amount of audio streams that are stored in the encoding audio buffer 104. The virtual decoding audio buffer occupancy calculator 205 virtually calculates the amount of audio streams that are accumulated in the decoding audio buffer 109 and stores the result into the virtual decoding audio buffer 206. The virtual decoding audio buffer 206 is a register or the like for storing the calculation results and it stores a virtual value of the occupancy amount of the decoding audio buffer 109 in the decoder.
The multiplexing data determinater 208 determines the type of pack to be multiplexed, the length of a stream and the timing of multiplexing based on the amount of video streams stored in the encoding video buffer 102, the amount of audio streams stored in the encoding audio buffer 104, the virtual occupancy amount of the decoding video buffer stored in the virtual decoding video buffer 203, and the virtual occupancy amount of the decoding audio buffer stored in the virtual decoding audio buffer 206. The control signal generator 207 generates a signal for informing the multiplexer 105 of what is determined by the multiplexing data determinater 208.
The operation that the multiplexing apparatus multiplexes data is described hereinafter with reference to FIG. 10. The following description is given mainly on an audio stream. The standards such as MPEG regulate data are in hierarchy. The audio stream is composed of a plurality of successive audio frames. The audio frame is a minimum unit of audio data whose length (frame length; LF) is fixed. When multiplexing the audio stream into a system stream, it is necessary to arrange an audio stream in a plurality of audio frames into an audio pack.
FIGS. 10A and 10B show this operation schematically. FIG. 10A shows an entire audio stream, which is made up of six audio frames A1 to A6. A random access point (RAP), which is described later, is set after the audio frame A6. FIG. 10B shows a fixed length audio pack AP. The audio pack includes a header, and a length after subtracting a header length from a pack length is a maximum stream length LP that can be included in the audio pack.
The case of arranging the six audio frames A1 to A6 shown in FIG. 10A into four audio packs AP1 to AP4 shown in FIG. 10B is described herein. A maximum stream length LP of an audio pack is different from a frame length LF of an audio frame. Therefore, if the audio frames A1 and A2 shown in FIG. 10A are arranged into one audio pack, the stream up to the middle of the audio frame A2 is normally included in the audio pack AP1. Thus, the audio pack AP1 shown in FIG. 10B includes the audio frame A1 and a part of the audio frame A2. The following audio pack AP2 includes the rest part of the audio frame A2, the audio frame A3, and a part of the audio frame A4. The audio pack AP3 includes the rest part of the audio frame A4, the audio frame A5, and a part of the audio frame A6. The audio pack AP4 includes the rest stream of the audio frame A6 and invalid data (padding packet). The padding packet is detailed later.
The audio packs AP that multiplexes a plurality of audio frames are then multiplexed with video packs VP in the multiplexer 105, thereby creating a system stream. FIG. 10C shows the system stream.
FIG. 10D is a view showing a buffer occupancy in a virtual decoding audio buffer when arranging the plurality of audio frames into audio packs. This is calculated by the virtual decoding audio buffer occupancy calculator 205. In FIG. 10D, BfMax indicated by a dotted line corresponds to an upper limit of the virtual decoding audio buffer. The upper limit is determined by the above standards or the like.
A normal operation is as follows. At time t1, since the free space of the virtual decoding audio buffer is not smaller than the maximum stream length LP, it is determined to multiplex an audio pack corresponding to a maximum stream length LP. Thus, the stream that includes the audio frame A1 and a part of the audio frame A2 is arranged into the audio pack AP1. The occupancy of the virtual decoding audio buffer thereby increases by the amount corresponding to the maximum stream length LP as shown from the time t1 to t2 in FIG. 10D. The gradient of the occupancy of the virtual decoding audio buffer at this time is based on a bit rate. Then, a video pack is multiplexed therewith from t2 to t3 of FIG. 10D. Since the audio pack is not multiplexed at this time, no change occurs in the occupancy of the virtual decoding audio buffer.
Then, at t3 in FIG. 10D, since the free space of the virtual decoding audio buffer is still not smaller than the maximum stream length LP, it is determined again to multiplex an audio pack corresponding to a maximum stream length LP. Thus, from t3 to t4, the occupancy of the virtual decoding audio buffer increases at the gradient based on the bit rate. At the time from t4 to t5, since the free space of the virtual decoding audio buffer is smaller than the maximum stream length LP, no audio pack is multiplexed with the system stream. Therefore, the multiplexer multiplexes a video pack or enters a wait time when no multiplexing is performed.
On the other hand, the occupancy of the virtual decoding audio buffer decreases by the amount corresponding to the audio frame length LF that is virtually assumed to be decoded already at every predetermined time as in t5 and t6 in FIG. 10D. The decrease is not at the gradient based on the bit rate like the increase in the occupancy but occurs at once in the above standards. Therefore, after a certain period of time, the free space of the virtual decoding audio buffer becomes the maximum stream length LP or larger. At that time, the audio packs are again multiplexed into the system stream like at t7 in FIG. 10D. Then, the occupancy of the virtual decoding audio buffer increases based on the gradient of the bit rate as shown from t7 to t8 in FIG. 10D.
The creation of an audio pack immediately before the time point called a random access point (hereinafter referred to as RAP) is described below. Since the standards such as MPEG do not allow reproduction of successive streams from an arbitrary point, a point (RAP) from which reproduction can be started is set at each predetermined time. It is necessary for the stream immediately after RAP to begin with a video stream with an image that is not associated with another image, if it is a video stream. If it is an audio stream, the stream immediately after RAP preferably begins with the head of an audio frame, which is a minimum unit.
Therefore, immediately before RAP, the audio pack is multiplexed even if the free space of the virtual decoding audio buffer is smaller than the maximum stream length LP. A change in occupancy in this case occurs from t10 to t11 in FIG. 10D.
Immediately before the RAP boundary, the audio packs are multiplexed if the free space of the virtual decoding audio buffer is equal to or larger than the rest of the stream of the audio frame. The audio pack corresponds to the audio pack AP4 in FIG. 10B and the rest of the audio stream of the audio frame A6 is arranged into the audio pack.
In this case, however, since the audio pack has a fixed length, invalid data, which is the padding packet described earlier, is added to the rest of the stream of the audio frame A6 so as to reach the fixed audio pack length, thus forming an audio pack. The occupancy of the virtual decoding audio buffer increases by the amount of the rest of the stream of the audio frame A6, which is valid data, in the audio pack AP4.
If, on the other hand, the free space of the virtual decoding audio buffer is smaller than the rest of the audio frame, it is impossible to perform the multiplexing of the audio pack. Therefore, by the time when the multiplexing of the audio pack becomes available, an invalid pack that is entirely composed of invalid data is inserted as shown from t8 to t9 in FIG. 10D.
As described above, the present invention has recognized that conventional multiplexing apparatus implement insertion of invalid packs and writing of invalid data according to the state of a virtual decoding audio buffer, which results in a decrease in bit rate of the entire apparatus.