Recently, in order to compress/encode video data, the MPEG (Moving Picture Experts Group) technology standardized as ISO/IEC 13818 has come into common use at broadcasting stations that produce and broadcast television programs. MPEG is becoming the de facto standard especially for recording video data generated by video cameras or the like, on tape, disks, or other recording media that can be accessed randomly or for transmitting video programs produced at broadcasting stations, via cables or satellites.
The MPEG technology is an encoding technology that can improve compression efficiency by means of predictive coding of pictures. More particularly, the MPEG standard employs a plurality of predictive coding systems that combine intra-frame prediction and inter-frame prediction, and each picture is encoded by means of either of the following picture types: I-picture (Intra Picture), E-picture (Predictive Picture), and B-picture (Bidirectionally Predictive Picture) according to the prediction system. The I-picture, which is not predicted from other pictures, is a picture encoded within the frame. The P-picture is a picture subjected to inter-frame forward predictive coding by a preceding (past) I-picture or P-picture. The B-picture is a picture subjected to bidirectionally predictive coding both by a preceding (past) I-picture or P-picture and by a following (future) I-picture or P-picture.
A multiplexing system for multiplexing a plurality of video programs produced at broadcasting stations will be described first with reference to FIG. 1.
The MPEG encoders 11 to 19 create encoded streams by encoding received source video programs V1 to V9, respectively, according to the MPEG standard described above. Such encoded streams are also known as elementary streams.
The packetizers 21 to 29 receive the elementary streams output from the MPEG encoders 11 to 19, respectively, and packetize them to create packetized elementary streams (PES). The packetizer process will be described in detail later.
Each of the transport stream generation circuits (TS Gen) 31 to 39 creates a transport stream consisting of 188-byte transport stream packets from the packetized elementary streams output from the respective packetizers 21 to 29.
The system target decoder buffers (STD buffers) 41 to 44 receive and buffer the transport streams output from the transport stream generation circuits. The STD buffers, which are fixed-capacity buffers specified by the MPEG standard, are provided for the purpose of simulation to prevent receive buffer from overflowing and underflowing on the MPEG decoder side.
The multiplexing circuit 40 receives a transport stream from each of the system target decoder buffers 41 to 44 and multiplexes the transport streams according to schedule.
Now the packetization by the packetizers 21 to 29 of the multiplexing system described in FIG. 1 as well as the delays produced during the packetization will be described in detail with reference to FIG. 2.
FIG. 2A shows the order of pictures in source video data supplied to the MPEG encoders. This is a typical example in which source video data is encoded as a GOP structure in the form of I, B, B, P, B, B, P, and so on.
FIG. 2B shows the order of pictures in an encoded stream (elementary stream) encoded by an MPEG encoder. Since B-pictures B2 and B3 are predictive-coded from both I-picture I1 and P-picture P4 as described above, the order of pictures in the encoded stream is I, P, B, B, P, B, B, P, and so on.
FIG. 2C shows a packetized elementary stream (PES) generated by a packetizer. Since a packetizer is the circuit that packetizes the encoded streams output from an encoder and adds a PES header to the packets, the order of pictures in the packetized elementary stream is the same as the order of pictures in the encoded stream output from the encoder.
The packetization carried out by packetizers does not take much time. As can be seen by comparing FIG. 2B and FIG. 2C, however, the packetized elementary stream lags behind the elementary stream by four frames. The reason for this delay will be described in detail below.
The MPEG standard described above defines the decoding timing of each picture for an MPEG decoder by the data called a decoding time stamp (DTS), and the display timing of decoded video data by the data called a presentation time stamp (PTS). Therefore, MPEG decoders must decode each picture in an encoded stream with the timing based on the DTS and output the decoded video data with the timing based on the PTS.
To enable such decoding, the MPEG standard requires the PTS and DTS to be specified for each picture when encoded streams are transmitted or multiplexed. Furthermore, the MPEG standard provides that the PTS and DTS information should be described in the PES header. In other words, the packetizer that generates packetized elementary streams must generate the PTS and DTS.
Now the determination of the PTS by the packetizer after the packetizer receives the elementary stream shown in FIG. 2B from an MPEG encoder will be described.
It is easy to determine a PTS for picture 11 received first because it is an I-picture, which is to be presented first. Let's assume that it is assigned a PTS of “1.”
The second picture received is a P-picture, P4. As can be seen from the order of pictures in the source video shown in FIG. 2A, P-picture P4 must be displayed after a plurality of B-pictures that follow it. At the time (t5) when the packetizer receives picture P4, however, it does not know how many B-pictures will be transmitted successively after picture P4. Therefore, it is not possible to determine the PTS of picture P4 at the time (t5) when it is received. Thus, the packetizer buffers the first picture I1 and second picture P4. This buffering must be continued until the PTS of picture P4 is determined.
The third and fourth pictures, B2 and B3, are B-pictures, so their PTSs can be determined immediately. That is, the PTS of picture B2 is “2” and the PTS of picture B3 is “3.”
The fifth picture, P7, is a P-picture. Only after receiving this P-picture (at t8), the packetizer knows that the second picture, P4, was followed by two successive B-pictures, and can assign the PTS of “4” to picture P4 after receiving P-picture P7 (at t8). In other words, only after receiving P-picture P7 (at t8), the packetizer knows that the GOP structure (I, P, B, B, P, and so on) of the elementary stream consists of two B-pictures sandwiched between an I-picture and P-pictures, and can decide the PTSs for all the pictures.
In order to determine PTSs as described above, the packetizer must buffer the elementary stream received at t4 until t8. In other words, there is a delay of four frames in the process of determining PTSs.
In case of a GOP structure with two B-pictures between an I-picture and P-pictures as shown in FIG. 2, there is a four-frame delay as described above. In case of a GOP structure with four B-pictures between an I-picture and P-pictures, there is a six-frame delay. Thus, if the number of B-pictures existing between an I-picture and P-pictures is denoted as N, there is a delay of (N+2) frames in the PTS determination process.
Besides, there are also problems in designing packetizers. For example, to produce a delay of four frames, four frame buffer memories are sufficient. However, since streams with various GOP structures may be supplied to packetizers as shown in FIG. 1, the number of frame memories must be designed assuming the maximum number of B-pictures that can exist between an I-picture and P-pictures so that any encoded stream of any GOP structure can be accommodated. As an example, if the maximum number of B-pictures is assumed to be “5” as a reasonable number, a multiplexing system for multiplexing nine video programs needs nine packetizers as shown in FIG. 1. This means that a total of 45 frame memories must be provided. Consequently, the problem with implementing such a multiplexing system is the high cost of equipment.
Furthermore, as shown in FIG. 3, the transmission of the video data prepared at a reporting site to individual households involves transmission from the reporting site to the main broadcasting station, transmission within the main broadcasting station, transmission from the main broadcasting station to local stations, transmission from the local stations to the households, etc. All these transmission processes involve generating packetized elementary streams. Consequently, delays are produced in the generation of the packetized elementary streams in the individual transmission processes and accumulated into a large delay.