1. Field of the Invention
This invention relates to a method and apparatus for multiplexing moving picture signals and acoustic signals on, for example, a magneto-optical disc or a magnetic tape, reproducing the recorded signals and displaying the reproduced signals on a display, and for transmitting the moving picture or acoustic signals of a teleconferencing system, television telephone system or a broadcasting equipment over a transmission channel from a transmitter to a receiver for reception and display on the receiver. The invention also relates to a recording medium for recording the multiplexed signals.
2. Description of the related Art
It has been practiced to compress picture or acoustic signals in accordance with the compression system of a pre-set standard to generate digital bitstreams of various signals, multiplex the bitstreams to form a multiplexed bitstream and to separate the multiplexed bitstream into stream data of the respective signals.
An illustrative example of the standard is the MPEG (Moving Picture Coding Experts Group). This MPEG is an abbreviation of a research organization for storage of moving pictures by ISO/IEC JTC1/SC29 (International Organization for Standardization/International Electrotechnical Commission, Joint Technical Committee 1/Subcommittee 2). The MPEG 1 and MPEG 2 standards are ISO11172 and ISO13818, respectively. Among these international standards, there are items ISO11172-1 and ISO13818-1 for multi-media multiplexing, items ISO11172-2 and ISO13818-2 for pictures and items ISO11172-3 and ISO13818-3 for speech.
FIG.1 shows a schematic structure of an apparatus for compressing picture signals using the international standards ISO11172-2 or ISO13818-2 or compressing acoustic signals using the international standards ISO11172-3 or ISO13818-3 for generating digital stream data, multiplexing these data and demultiplexing the multiplexed data into respective stream data.
Referring to FIG.1, video data 100, audio data 101 and other data 102 are supplied to associated encoders, namely a video encoder 103, an audio encoder 104 and an other encoder 105, respectively, for producing encoded stream data, herein termed elementary streams 106, 107 and 108. The multiplexer (MUX) 106 multiplexes these elementary streams for generating a unified stream data, herein termed a multiplexed stream 110. In a method specified by the MPEG1 or MPEG2, the synchronization information of the picture or acoustic signals with resect to the time axis are simultaneously recorded as the subsidiary information in a multiplexed stream. This multiplexed stream 112 is sent over a recording medium 111 or a transmission medium to the receiver.
In the receiver, the multiplexed stream 112 enters a demultiplexer (DEMUX) 113. The demultiplexer separates the elementary streams from one another while the relation of synchronization is kept. The elementary streams, thus separated, enter the associated decoders, namely the video decoder 117, audio decoder 110 and the other decoder 119 to generate a video signal 120, an audio signal 118 and an other signal 122 by a display device, such as a monitor or a speaker.
Referring to FIG. 2, the demultiplexing method specified in the International Standards ISO11172-1 or 13818-1 for multimedia multiplexing, such as MPEG1 or MPEG2, is explained. The demultiplexing method shown in FIG. 2 uses an idealized decoder and is termed a system target decoder (STD) model.
The multiplexed stream supplied to an input terminal 131 of the model shown in FIG. 2 is obtained on time divisionally multiplexing the elementary streams supplied to the decoders. The data obtained on time divisional multiplexing by a changeover switch 132 of FIG. 2, corresponding to the above-described separator, are sent to associated buffers, that is a video buffer 141, audio buffers 142, 143, . . . and an other buffer 144 of a video decoder 135, several audio decoders 136, 137, . . . and another decoder 138, respectively. The data read out from the video buffer 141 are sent to a video decoder 145, while data read out from the audio buffers 142, 143, . . . are sent to audio decoders 146, 147, . . . and data read out from the other buffer 144 are sent to the other decoder 148. If the data from the video decoder 145 is an I-picture (intra-frame coded picture) or a P-picture (forward predictively coded picture), it is sent via a re-arraying buffer 149 to one of fixed terminals of a changeover switch 150. If the data from the video decoder 145 is a B-picture (backward predictively coded picture), it is sent to the other fixed terminal of the changeover switch 150. An output of the changeover switch 150 is taken out at a terminal 151 while outputs of the audio decoders 136, 137 are taken out at terminals 152, 153, . . . , respectively.
The rate of delivery of the multiplexed stream to the changeover switch 132 as the separator is termed MUX.sub.-- rate. The value of MUX.sub.-- rate depends on the recording medium or the form of transmission and is stated in the multiplexed stream. Since the stream is multiplexed, data delivery from the separator to the buffers 141 to 144 associated with the respective elementary streams is intermittent and burst-like. In the present model, data delivery from the buffers 141 to 144 to the decoders 145 to 148 occurs instantaneously with an ideal data unit. That is, the delay time by transfer is zero. The decoding in each decoder also occurs simultaneously with this data unit, with the delay time for decoding being also zero. This ideal data unit is termed an accessing unit. This ideal data unit is an encoded picture or frame and an audio frame for video and audio, respectively.
The multiplexed streams prescribed in the above international standards ISO11172-1 or 13818-1 are generated by controlling the multiplexer by the encoder so as to prevent overflow or underflow in the buffers 141 to 144 in the present model, as will be explained subsequently. Specifically, the data bandwidth of data transmission, indicated by MUX.sub.-- rate, is shared by plural streams and utilized in a time-sharing fashion. The data are ultimately supplied via the buffers to the decoders 145 to 148 for display.
Also, since the subsidiary information called time stamp for maintaining the relation of synchronization for display is available from the multiplexed stream, the decoder uses this data for synchronization reproduction. For example, the international standards ISO11172-1 or 13818-1 for multimedia multiplexing prescribe time stamps called SCR (system clock reference), DTS (decoding time stamp) and PTS (presentation time stamp). The SCR is stated in the pack header, as later explained, while the DTS and the PTS are stated in the packet header, also as later explained. The SCR is the reference value owned by the STD model. This value is acted upon for reading the multiplexed stream in the decoder, or controlling the data input to each buffer. The DTS and the PTS, which can be stated from one accessing unit to another, denote the time of extraction and decoding of the associated accessing units from the respective buffers in the STD model, respectively. In the international standards ISO13818-3 and 11172-3 for speech signal compression, for example, since the PTS and the DTS are concurrent, only the value of PTS is stated on the multiplexed stream. In the international standards ISO13818-2 and 11172-2 for speech signal compression, for example, since it becomes necessary, after decoding by the decoder, to delay a picture depending on the arraying of pictures, such as I-picture preceding a B-picture, it may be an occurrence that the DTS differs from the PTS. In such case, both time stamps are stated on the multiplexed stream.
FIG. 3 shows the structure of a multiplexed stream prescribed in the above international standards ISO13818-1 and 11172-1. The multiplexed stream is comprised of time-divisionally multiplexed elementary streams and is made up of plural packets PT. The individual packets PT are each composed of data of unitary elementary streams, while there is no possibility of co-existence of data of plural elementary streams. A packet header H.sub.PT is appended to each packet PT for specifying the information showing the contents of the packet and the aforementioned PTS and DTS. The size of the packet PT is usually variable and is stated in the packet header H.sub.PT. The packet composed of video data and the packet composed of audio data are termed a video packet and an audio packet, respectively.
A number of the packets PT, collected together, are termed a pack PK. Each pack PK has the aforementioned pack header H.sub.PK and specifies the information such as SCR. As an example, the size of the pack PK is set depending on characteristics of the transmission medium. In a video CD, for example, each pack corresponds to a sector and is of a fixed length.
Referring to FIGS. 4 and 5, the method of controlling the data delivery to each buffer using the time stamp in the multiplexed stream and data outputting from each buffer from one accessing unit to another (data inputting to the decoder) is explained. This control method is merely an example in the ideal STD model and actual decoders do not necessarily execute the same operation.
FIG. 4 shows an example of a multiplexed stream made up of a video elementary stream and an audio elementary stream multiplexed together. For simplicity in explanation, it is assumed that a pack is formed for each packet. The video stream conforms to, for example, the above-mentioned international standard ISO13818-2 or ISO11172-2 and is encoded in the sequence of an I-picture (intra-coded picture), a B-picture (bidirectionally predictive-coded picture), a B-picture, and so forth. The respective pictures are termed a picture VF1, a picture VF2 and a picture VF3, and so forth. The sizes of the pictures are set to S1, S2, S3 and so forth. The audio stream shown conforms to, for example, the international standards ISO13818-3 or 11172-3 and is made up of plural audio frames (audio frame AF1, audio frame AF2 and so forth).
The multiplexed stream shown conforms to, for example, the multiplexed stream prescribed in the international standards ISO13818-1 or 11172-1. The pack header H.sub.PK in each pack PK states the above SCRs (SCR1, SCR2, SCR3, SCR4 and so forth, where SCR1&lt;SCR2&lt;SCR3&lt;SCR4). The packet header H.sub.PT1 of the packet PT1 states the PTS and the DTS (PTS1 and DTS1) associated with the picture VF1 (I-picture), the packet header H.sub.PT2 of the packet PT2 states the PTS (PTS2) associated with the picture VF2 (B-picture) and the packet header H.sub.PT4 of the packet PT4 states the PTS (PTS4) associated with the picture VF3 (B-picture). The packet header H.sub.PT3 of the packet PT3 states the PTS (PTS3 ) associated with the audio frame AF1.
FIG. 5 shows changes in the amount of buffer occupation in the STD model for the bitstream of FIG. 4. The amount of change is controlled by the above-mentioned time stamp. In FIG. 5, H1, H2, H3, H4, W1, W2 and A1 denote the time width during which data delivery to the video buffer is stopped, so that there is no change in the amount of occupation, as will be explained in detail. Meanwhile, the description on the buffer for the audio stream shown in FIG. 4 is not made.
At an arbitrary time point, the multiplexed stream starts to be read into the decoder at a transfer rate of MUX.sub.- rate. H1 shows the time during which each header of the pack PK1 and the packet PT1 of the multiplexed stream is read. During this time, data delivery to the video buffer is stopped. When the first SCR (SCR1) stated in the pack header H.sub.PK1 is read from the multiplexed stream, the reference clock in the STD model (termed STC) is reset the value of the SCR1. The STC is then counted up at a pre-set period. Subsequently, the remaining data in the pack header H.sub.PK downstream of SCR1 and subsequent packet headers H.sub.PT are read at the same transfer rate MUX.sub.-- rate. Directly after reading the packet header H.sub.PT, data input to the video buffer is started at the same transfer rate MUX.sub.-- rate. The rightwardly rising straight line denotes the state in which data is being supplied to the video buffer, with the gradient depicting the transfer rate (MUX.sub.-- rate). Data transfer to the video buffer is continued until the entire video data in the packet has been read out. Readout of the pack header H.sub.PK2 in the pack PK2 is then started. This readout is continued until the value of the SCR2 stated in the header is read out.
When the SCR2 is read, it is compared to the counted-up value of the STC, and the readout of the multiplexed stream is stopped until the STC value becomes equal to the SCR2 value. W1 denotes the time period during which the data delivery is stopped. When the STC subsequently becomes equal to the value of SCR2, readout of the remaining data downstream of SCR2 and the following packet headers in the H2 period is performed at the MUX.sub.-- rate transfer rate. Directly after the packet header data is read, data delivery to the video buffer is re-started at the MUX.sub.-- rate transfer rate. Subsequently, the same sequence of operations is repeated.
The data delivery to the buffer for each elementary stream is controlled in accordance with the SCR stated on the multiplexed stream, as described above. That is, data delivery is not made to any buffer until the value of the reference clock STC becomes equal to the value of the current SCR. From the side of the multiplexer, this may be summarized such that, if there is no necessity of data delivery, it suffices to insert a pack header and to state the time of re-starting data delivery as the SCR value.
The time width A1 in FIG. 5 denotes the time period during which the packet PT3 (audio packet) in the pack PK3 is read. During this time, data transfer to the audio buffer occurs at the same transfer rate MUX.sub.-- rate. Thus, there is no data delivery to the video buffer during the time period A1, such that no change is caused in the amount of occupation. Meanwhile, when the STC becomes equal to the value of DTS1, the accessing unit of the size S1 associated with DTS1 (picture VF1) is instantly extracted from the video buffer and transferred to the decoder. FIG. 12 shows an instance for SCR4&lt;DTS1. The pictures VF2 and VF3 are also instantly extracted from the video buffer at time points PTS2 and PTS3 by similar control. Although input/output control of the video buffer in the STD model has been described above, each buffer associated with audio or multiplexed other stream data is controlled in a similar manner.
As described above, the time stamp is the crucial information used for input/output control in each buffer. Thus, if the time stamps are not stated with proper values on the multiplexed stream on each stream data time-shared to a suitable length, buffer management cannot be done correctly. An instance in which buffer management becomes broken due to inappropriate time stamps is hereinafter explained. In FIG. 5, the time DTS1 denotes the time of extraction of the picture VF1 of size S1. If, on the multiplexed stream, the time DTS1 is smaller than the value of SCR2, the picture VF1 cannot be extracted at time DTS1, since an amount of data sufficient for extraction (S1) is not supplied to the video buffer unit time DTS1. This state of the buffer is termed underflow in the buffer.
On the other hand, if the value of the time DTS1 is sufficiently larger than the value of SCR4, the picture VF1 is not extracted at a proper time and, moreover, data is supplied during such time to the buffer, so that, at a certain time point, the amount of occupation of the buffer exceeds the allowable amount of the buffer (Buffer.sub.-- Size in the drawing). This state of the buffer is termed overflow in the buffer.
With the multiplexer, each elementary stream needs to be time-divisionally multiplexed with a suitable length, while proper time stamp values need to be set on the multiplexed stream, for possibly preventing breakage in the buffer in the decoder, such as overflow or underflow.
In general, video, audio or other elementary streams to be multiplexed are independently encoded prior to multiplexing. For example, in video encoding according to the above-mentioned international standards ISO13818-2 or 11172-2, a buffer model different from the STD model is prescribed, and a video stream according to this model can be decoded and displayed at a proper time interval in a sole video decoder. The scheduling for multiplexing is governed by these buffer models in the decoder of the elementary streams. That is, the multiplexer is required to perform multiplexing in such a manner as to maintain proper matching to the buffer models of the individual elementary streams and synchronism of the elementary streams.
Referring to FIG. 6, the amounts of occupation of the buffer in the buffer model of the elementary stream and the above-mentioned STD model is explained. In the present instance, multiplexing of a sole video stream and a sole audio stream is explained. It is assumed that the video and audio streams have been encoded in accordance with, for example, the aforementioned international standards ISO13818-2 and 11172-2 and ISO13818-3 or 11172-3, respectively. In the following description, a buffer in the STD model is termed an STD buffer for avoiding confusion. Also, in the present instance, the time required for transmission of data on the multiplexed streams, such as pack headers and packet headers, is disregarded and not shown. FIGS. 6A and 6B show instances of an audio STD buffer models and a video STD buffer model, respectively.
First, a serrated curve (a) in FIG. 6B shows the state of occupation of a buffer in video buffer verifier (VBV) prescribed in the above-mentioned international standards ISO13818-2 and 11172-2, that is a buffer in an ideal sole video decoder (termed a VBV buffer). In this figure, a rightwardly rising curve denotes data delivery to the VBV buffer. The rate of change is fixed, so that the inclination of change is constant. At an arbitrary time point, encoded video data starts to be supplied to the VBV video buffer. After lapse of a time of VBV.sub.-- delay shown in FIG. 6, encoded video data of the first display unit (picture I2) is instantly extracted from the VBV buffer and decoded. That is, there is zero delay in picture extraction and decoding. The above operation is repeated for the values VBV.sub.-- delay for respective pictures. The amount of occupation in each VBV buffer delineates a serrated line as shown at (a) in FIG. 6B. In the above referenced international standards ISO13818-2 and ISO11172-2, the values of VBV.sub.-- delay are set for individual pictures and stated in each encoded picture.
Meanwhile, in the STD buffer associated with each elementary stream, since the multiplexed data is time-divisionally multiplexed, as above explained, data delivery occurs in a burst fashion. The trajectory of the video STD buffer is shown by a polygonal line or curve (b) in FIG. 6B. The data delivery to the video STD buffer is indicated by a rightwardly rising curve, with the rate (inclination) being MUX.sub.-- rate. If data is supplied to an STD buffer for an elementary stream other than the video elementary stream, data delivery to the video STD buffer ceases, so that the inclination becomes flat.
As shown above, the trajectory of the occupied amount of the video STD buffer need not be coincident with that of the VBV buffer. However, since the time interval in extraction and decoding of each picture is pre-set by, for example, the value of VBV.sub.-- delay, the time tamp needs to be correspondingly set in the video STD model. Moreover, since the size of the extracted picture in the VBV model is pre-set, multiplexing needs to be scheduled in the video STD model so that the smallest possible amount of data of the size will be supplied to the video STD buffer until extraction of each accessing unit (picture). Consequently, the trajectory of the amount of occupation of the video STD buffer lies above the trajectory of the amount of occupation of the VBV video buffer.
The time of cessation of data delivery to the video STD buffer means the time of data delivery to the audio STD buffer. FIG. 6A shows the amount of occupation of the audio STD buffer.
For example, in the audio encoding system of the above-mentioned international standards ISO11172-3 and 11172-3, buffer models, such as VBV models, are not prescribed. It is assumed here that the audio stream is supplied at a constant transfer rate to a buffer owned by the sole audio decoder, termed an A-buffer, so as to be instantly extracted at a constant time interval on the audio frame basis and instantly decoded. In this case, the capacity of the A-buffer is set so as to be at least larger than the length of each audio frame. In FIG. 6A, a polygonal line or a curve (c) in the graph of FIG. 6A shows changes in the occupied amount in the A-buffer. Since the audio stream is extracted at a pre-set time interval on the audio frame basis and decoded, the buffer capacity delineates a serrated curve.
On the other hand, data delivery to the audio STD buffer is as shown by the polygonal line or a curve (d) in the graph of FIG. 6A. The data transfer rate to the audio STD buffer is MUX rate. The accessing unit (audio frame) is removed by instant extraction from the audio accessing unit and decoded instantly. In the present instance, the audio accessing unit is assumed to be sufficiently small and the accessing unit is assumed to be removed from the STD buffer at a constant rate. For the same reason as that for the video STD buffer, the occupied amount of the audio STD buffer delineates a trajectory that is not coincident with the encoding model of the sole video decoder (model of the A-buffer).
In FIG. 6, domains (e), (f) and (g) denote data delivery domains to the audio STD buffer and to the video STD buffer and data delivery stop domains for both the audio STD buffer and the video STD buffer.
The above instance is directed to multiplexing of each one video stream and audio stream. However, plural audio streams can be handled in actual application. In such case, data delivery scheduling in multiplexing, taking into account the encoding model for respective elementary streams, becomes more complex.
Multiplexing means time-sharing multiplexing of plural elementary streams into a unified data stream. However, the above-mentioned STD scheduling becomes broken under certain scheduling, as will be explained by referring to FIG. 7.
FIG. 7 shows multiplexing of a sole video stream V and two audio streams (a first audio stream A1 and a second audio stream A2). Each graph in FIG. 7 denotes time changes of the occupied amounts of the STD buffers for the respective elementary streams. Specifically, FIGS. 7A, 7B and 7C show the occupied amounts of the STD buffers for the first audio stream A1, second audio stream A2 and the video stream V, respectively.
In FIG. 7, T.sub.n-2, T.sub.n-1 and T.sub.n denote the extraction time points of (n-2 )th, (n-1)th and nth accessing units or pictures A.sub.n-2, A.sub.n-1 and A.sub.n, respectively. In the instant case, the extraction time intervals (W.sub.n-1, W.sub.n, and so forth) of the video accessing units are data delivery unit time intervals. That is, the data delivery scheduling is determined from one unit time interval to another. Specifically, one or more of STD buffers, to which data should be supplied during the time W.sub.n-1 since time T.sub.n-1 until time T.sub.n is selected at a time T.sub.n-1, taking into account the data outputting state from all of the STD buffers that is likely to occur since time T.sub.n-1 until time T.sub.n, and the amounts of occupation are adjusted accordingly. This unit time corresponds to 1/29.97 second and to 1/25 second in NTSC and in PAL, respectively. FIG. 14 shows a case in which the video STD buffer has become broken at time T.sub.n as a result of the above scheduling, as now explained in detail.
At time T.sub.n-1 in FIG. 7, data delivery to the video STD buffer is started, taking into account the extraction of the video accessing unit at the next time point T.sub.n. However, since the STD buffers for the two audio streams are likely to underflow substantially simultaneously at points (a) and (b) in FIG. 7, data delivery is switched to that for the audio STD buffers. During this time interval, data delivery to the video STD buffer ceases. That is, the first audio stream A1 and the second audio stream are supplied at the domains (c) and (d), respectively. After sufficient amounts of data have been supplied to the audio STD buffers, data delivery is again made to the video STD buffer. However, since sufficient data is not supplied up to time T.sub.n, buffer underflow occurs at a time instant the next video accessing unit A.sub.n is extracted, as shown at point (e) in FIG. 7. Also, since the video buffer is ruptured in this manner, buffer underflow occurs in portions (f) and (g) in FIG. 7 for the first and second audio streams A1 and A2, respectively.
In the instant case, data delivery scheduling to the respective STD buffers is set based on the extraction time points of the respective video accessing units. That is, a pre-readable portion (h) of an accessing unit is used as one video accessing unit. In the instant case, the video STD buffer is ruptured because the sum of data supply rates required for the STD buffers in this unit time has exceeded the total data delivery rate, or MUX.sub.-- rate. The reason is that, because of the shorter unit time in scheduling, the transmission band cannot be allocated appropriately should data delivery to many STDs be requested during such time.
FIGS. 8A to 8C show an instance in which the scheduling unit time in the data delivery is set so as to be longer than in the above case. In the case of FIG. 8, data delivery to the STD buffers is based on the extraction time of two video accessing units. That is, the pre-readable portion (a) of the accessing unit corresponds to two video accessing units. Specifically, one or more of STD buffers, to which data should be supplied during the time W.sub.x since time T.sub.n-2 until time T.sub.n, is selected at a time T.sub.n-2, taking into account the data outputting state from all of the STD buffers that is likely to occur since time T.sub.n-2 until time T.sub.n, and the amounts of occupation are adjusted accordingly. As compared to the case of FIG. 7, the data delivery timing to the STD buffer of the audio D2 is intentionally set backtime, that is set to a temporally previous time point, for thereby evading rupture of each STD buffer. The domains (b) and (c) in FIG. 8 denote the data delivery domains to the second audio STD buffer A2 and to the first STD audio buffer A1, respectively.
As discussed in the foregoing, rupture of the STD buffers can be evaded to some extent by setting the processing unit time in the multiplexing data delivery scheduling so as to be longer for taking into account the phenomenon occurring temporally subsequently to the current time. However, the above-mentioned international standards ISO13818-1 and 11172-1 allow for multiplexing of up to 32 audio streams at the maximum. In such case, it is extremely difficult to schedule STD model multiplexing for evading the buffer rupture under any circumstances. That is, in the above-described method in which the processing unit time is finite, there perpetually exists the possibility of rupturing the STD model in deciding the data delivery schedule, such that it is impossible to assure completely safe multiplexing.