It is important in many video processing applications to accurately splice together bit streams from different video sources. The advent of digital video compression standards such as MPEG-2 has substantially increased the complexity of video splicing. Issues which arise when performing splicing operations on compressed digital streams include time-base synchronization, buffer management, audio/video format changes and audio/video synchronization. Failure to properly address these issues in the splicing operation can produce undesirable results at a decoder which receives the composite stream. The undesirable results include audio and video artifacts, decoder buffer overflows and underflows, and unsynchronized audio and video.
The MPEG-2 standard is used in digital multimedia applications which require efficient compression and transmission of video, audio and other information. The MPEG-2 standard was developed by the International Standards Organization (ISO) Moving Pictures Expert Group (MPEG) and is documented in ISO/IEC IS 13818-1, "Information Technology-Generic Coding of Moving Pictures and Associated Audio Information: Systems," ISO/IEC IS 13818-2, "Information Technology-Generic Coding of Moving Pictures and Associated Audio Information: Video" and ISO/IEC IS 13818-3, "Information Technology-Generic Coding of Moving Pictures and Associated Audio Information: Audio." The above-cited ISO documents are incorporated herein by reference. The systems specification of MPEG-2 provides a multi-layer hierarchical organization for multiplexing and transmission of audio-video data streams, and is described in greater detail in A. Wasilewski, "MPEG-2 Systems Specification: Blueprint for Network Interoperability," Communications Technology, February 1994, which is incorporated by reference herein. The MPEG-2 video and audio specifications provide compression and encoding of video and audio data streams. MPEG video compression is described in greater detail in D. LeGall, "MPEG: A Video Compression Standard for Multimedia Applications," Communications of the ACM, Vol. 34, No. 4, pp. 46-58, April 1991, which is incorporated by reference herein.
The systems aspects of the MPEG-2 standard generally involve multiplexing several elementary streams from one or more programs to form a higher level packet-based stream. A given program may correspond to one or more television or motion picture signals and may include multiple elementary streams in the form of separately-encoded compressed video and audio data streams, as well as other program data streams such as closed caption text. The higher level packet-based stream in accordance with the MPEG-2 standard may be either a program stream or a transport stream. An MPEG-2 program stream generally carries a single program such that all elementary streams in the program stream share a common time base, while an MPEG-2 transport stream can carry elementary streams from multiple programs with different time bases.
The program and transport streams associate related elementary data streams for a given program or programs such that the elementary streams can be extracted, decoded and presented together in a coherent fashion. The program and transport streams may be recorded on or played back from a digital video disc (DVD), video tape, magnetic or optical disk drive or other suitable storage device. It should be noted that both program streams and transport streams may be considered part of a transport layer in accordance with the ISO network reference model as set forth in the ISO 7498 standard. The term "transport stream" as used herein is therefore intended to include both MPEG-2 program streams and transport streams as well as other packet-based data streams formed in accordance with standards other than MPEG-2.
The elementary stream data is separated in accordance with the MPEG-2 standard into a sequence of variable-length packetized elementary stream (PES) packets. The PES packet structure separates the relatively long elementary streams into more manageable units, and permits the attachment of timing, identification and control information to particular portions of an elementary stream. A given PES packet has a PES header and a quantity of elementary stream data. The PES header may include timing information such as presentation time stamps (PTSs) and display time stamps (DTSs). Each PES packet is packetized into a plurality of fixed-length 188-byte transport packets which form an MPEG-2 transport stream.
FIG. 1A shows a portion of an MPEG-2 transport stream. Each transport packet includes a variable-length packet header followed by a payload. The packet header includes an 8-bit sync pattern which identifies the beginning of the transport packet, and a 13-bit packet identifier (PID) which identifies the data being carried by the transport packet. All PES-bearing transport packets with a given PID carry elementary stream data for only a single elementary stream and no other. The exemplary transport packet shown also includes a variable-length adaptation field which will be described in greater detail below. The transport packet header also includes two adaptation field control bits which indicate whether the corresponding transport packet includes a payload with no adaptation field, an adaptation field with no payload, or both an adaptation field and a payload as in FIG. 1A. The header also includes an error indicator bit, a payload unit start indicator bit, a transport priority bit, two transport scrambling control bits and a four-bit continuity counter. The payload portion of the transport packet will include elementary stream data from a corresponding PES packet if the transport packet is of the PES-bearing type. The transport packet may also be of the program specific information (PSI) type or the private data type.
FIG. 1B shows the format of an exemplary adaptation field in an MPEG-2 transport packet. The adaptation field may include a 42-bit program clock reference (PCR) which represents the value of the system time clock (STC) for a given program at the time when the PCR bits were inserted into the transport stream. Each program may have a different STC and therefore transport packets carrying elementary streams from different programs will generally have asynchronous PCRs. The PCR bits include thirty-three bits of PCR-BASE and nine bits of PCR-EXT. The PCR-EXT is a modulo-300 counter incremented at a clock rate of 27 MHz. The PCR-BASE is incremented after every 300 increments of the 27 MHz clock and thus represents a thirty-three bit counter operating at about 90 kHz. The PCR information is inserted into a transport packet during an encoding or multiplexing operation and is utilized in transport packet decoding to initialize and maintain the decoder system clock. Synchronization of audio, video and data streams within a given program is provided using the PCR information as well as the PTSs and DTSs which may be placed in the PES packet header.
Other elements of the exemplary adaptation field of FIG. 1A particularly relevant to splicing operations will now be described. A discontinuity indicator bit in the adaptation field is used for identifying time base discontinuities and continuity counter discontinuities. A time base discontinuity can occur in a packet with a PID designated as a PCR.sub.-- PID and indicates that the next program clock reference (PCR) in a transport packet with the same PID represents a sample of a new STC for the associated program. It should be noted that portions of PES packets with PTS or DTS values referring to the new time base may not occur in the transport stream prior to the transport packet with the first PCR of the new time base. Also, once the transport packet with the first PCR of the new time base has occurred, portions of PES packets with PTS or DTS values that refer to the old time base may not occur. A continuity counter discontinuity indicates that the continuity counter field for a given transport packet may be discontinuous with respect to the previous transport packet having the same PID. If this PID is also the PCR.sub.-- PID, then the continuity counter may only be discontinuous in the packet that contains the time base discontinuity.
An optional 8-bit splice countdown field serves to provide advance warning of a splice point by indicating the number of packets with the same PID which will arrive prior to the splice point. The countdown will reach zero in the packet immediately prior to the splice point. The packet in which the splice countdown is zero can thus be used to identify a splice point in a given transport stream. Following the splice point, the splice countdown field may take on negative values indicating that the splice point occurred a given number of packets previously. A splicing point flag is used to indicate whether or not the splice countdown field is present.
For video transport packets, an optional 4-bit splice type field specifies one of several different splice types depending on the video profile and level. The 4-bit value serves as an index to a table which specifies a splice decoding delay and a maximum splice rate, both of which represent restrictions on construction of a seamless splice. The splice decoding delay essentially specifies the buffer fullness at the splice point, while the maximum splice rate gives the maximum supported rate of the new stream such that decoder underflow or overflow will not occur. For audio transport packets, the splice type is set to "0000."
A 33-bit DTS.sub.-- next.sub.-- au field indicates the decoding time of the first access unit following the splice point in the event that no splice operation is performed. An access unit is a coded picture or audio frame and thus part of an elementary stream. A seamless splice flag is used to indicate the presence of splice type and DTS.sub.-- next.sub.-- au fields. This flag is not set to "1" unless the splicing point flag is also set to "1", and once set to "1" it remains set as long as the splice countdown field is positive.
FIGS. 2A-2C illustrate the importance of decoder buffer management in an exemplary transport stream splicing operation. If the splicing operation does not take into account the buffer trajectories of the two streams that are being spliced, there will be a high probability of either an underflow or an overflow of the decoder buffer. FIGS. 2A-2C show buffer trajectories for sequences of intra-coded (I), predictive-coded (P) and bidirectionally predictive-coded (B) MPEG-2 frames making up different transport streams. The buffer trajectories are typically provided by a video buffering verifier (VBV) in an encoder which generates the corresponding streams. FIG. 2A shows the decoder buffer trajectory for a first transport stream in which the P-frame designated by the arrow is the first frame to be replaced by another stream in a splicing operation. The first transport stream is also referred to herein as an input transport stream. FIG. 2B shows the decoder buffer trajectory for a second transport stream to be spliced into the first stream beginning with the I-frame designated by the arrow. The second transport stream is also referred to herein as an insertion stream. It can be seen from FIGS. 2A and 2B that both the first and second streams result in a decoder trajectory which remains below a maximum decoder buffer capacity. FIG. 2C shows the stream resulting from splicing the second stream beginning with the designated I-frame in place of a portion of the first stream beginning with the designated P-frame. The splice point is indicated by a solid arrow. The splice results in a decoder buffer overflow condition as indicated by a dashed arrow.
A number of other problems can arise in transport stream splicing. The problem of time base synchronization arises because the STCs used to generate the PCRs of the transport streams to be spliced will generally be asynchronous. The resulting time base discontinuity must be taken into account to avoid decoder buffer overflow and underflow. For example, if a time base discontinuity occurs, and a sequence of frames from the first stream remains in the decoder buffer, the frames may not be decoded properly. Because the decoding and presentation timestamps of the first stream reference a time base unrelated to that of the second stream, the decoder may incorrectly process those frames, resulting in a decoder buffer overflow. Other problems include the different durations of video and audio access units which prevent simultaneous seamless splicing of video and audio, the need to use closed prediction on the splice boundaries, the possibility of splicing from MPEG-1 to MPEG-2 video, and the possibility that the resolution or profile may change for video or that the sample rate or coding layer may change for audio.
The MPEG-2 standard supports both seamless and non-seamless splices. Seamless splices generally do not result in decoding discontinuities. The decoding time of the first access unit from the insertion stream is thus consistent with the decoding time of the access units of the input stream in which it is inserted. In other words, the first access unit from the insertion stream will be decoded at the same time that the first post-splice access unit of the input stream would have been decoded if the splice had not occurred. This decoding time is referred to as the seamless decoding time. It should be noted that the lack of decoding time discontinuities generally does not ensure acceptable decoder buffer behavior. In order to prevent decoder buffer overflow or underflow, certain conventions must be followed in the encoding of the streams that are spliced.
Non-seamless splices can result in decoding discontinuities. This means that the decoding time for the first access unit from the insertion stream generally does not equal the seamless decoding time. However, it is possible to create non-seamless splices that produce no unacceptable artifacts and therefore appear seamless to the viewer. In terms of application, seamless splices are generally more likely to be used in the studio environment while non-seamless splices are more likely to be used in broadcast applications such as ad insertion.
One of the challenges of both seamless and non-seamless splicing of MPEG-2 transport streams is the fact the MPEG-2 standard does not specify the display process. In particular, the display behavior of a decoder after an end.sub.-- of.sub.-- sequence code and at a time base discontinuity is unspecified.
Prior art stream insertion techniques generally utilize digital techniques for storage and retrieval of the stream to be inserted and analog techniques for the actual insertion operation. Exemplary analog ad insertion systems are described in greater detail in C. Brechin, "Advantages of digital ad insertion," Communications Technology, pp. 50-54, May 1995 and T. Walsh, "Ad insertion system architecture," Communications Technology, pp. 56-66, May 1995. The analog ad insertion systems typically detect a cue tone in a broadcast program to determine when and where a commercial or other advertisement is to be inserted within the program. The stream to be inserted is retrieved and decoded to generate an analog signal which is applied to a switch or router for insertion into the broadcast program. Although analog systems produce acceptable results in many applications, such systems fail to take advantage of the flexibility offered by digital video standards such as MPEG-2. Other problems with the prior art analog insertion systems include the excessive equipment costs resulting from the use of multiple digital-to-analog decoders to convert multiple streams to analog format prior to insertion. A number of proposals for facilitating digital splicing of MPEG-2 bit streams are described in S. M. Weiss, "Switching Facilities in MPEG-2: Necessary But Not Sufficient," SMPTE Journal, pp. 788-802, December 1995. However, the proposals fail to provide a simple and efficient technique for non-seamless digital splicing of MPEG-2 transport streams. Also, prior art digital seamless splicing techniques may require the streams to be demultiplexed into their respective elementary streams prior to the splicing operation and then remultiplexed after splicing. These demultiplexing and remultiplexing operations can unduly increase the complexity and processing overhead associated with digital seamless splicing.
As is apparent from the above, there is a need for an improved technique for non-seamless splicing of transport streams which utilizes digital stream insertion and avoids the above-noted problems of the prior art.