1. Field of the Invention
The present invention relates to processing of compressed audio/visual data, and more particularly to splicing of streams of audio/visual data.
2. Background Art
It has become common practice to compress audio/visual data in order to reduce the capacity and bandwidth requirements for storage and transmission. One of the most popular audio/video compression techniques is MPEG. MPEG is an acronym for the Moving Picture Experts Group, which was set up by the International Standards Organization (ISO) to work on compression. MPEG provides a number of different variations (MPEG-1, MPEG-2, etc.) to suit different bandwidth and quality constraints. MPEG-2, for example, is especially suited to the storage and transmission of broadcast quality television programs.
For the video data, MPEG provides a high degree of compression (up to 200:1) by encoding 8xc3x978 blocks of pixels into a set of discrete cosine transform (DCT) coefficients, quantizing and encoding the coefficients, and using motion compensation techniques to encode most video frames as predictions from or between other frames. In particular, the encoded MPEG video stream is comprised of a series of groups of pictures (GOPs), and each GOP begins with an independently encoded (intra) I frame and may include one or more following P-frames and B-frames. Each I frame can be decoded without information from any preceding and/or following frame. Decoding of a P frame requires information from a preceding frame in the GOP. Decoding of a B frame requires information from a preceding and following frame in the GOP. To minimize decoder buffer requirements, each B frame is transmitted in reverse of its presentation order, so that all the information of the other frames required for decoding the B frame will arrive at the decoder before the B frame.
In addition to the motion compensation techniques for video compression, the MPEG standard provides a generic framework for combining one or more elementary streams of digital video and audio, as well as system data, into single or multiple program transport streams (TS) which are suitable for storage or transmission. The system data includes information about synchronization, random access, management of buffers to prevent overflow and underflow, and time stamps for video frames and audio packetized elementary stream packets. The standard specifies the organization of the elementary streams and the transport streams, and imposes constraints to enable synchronized decoding from the audio and video decoding buffers under various conditions.
The MPEG 2 standard is documented in ISO/IEC International Standard (IS) 13818-1, xe2x80x9cInformation Technology-Generic Coding of Moving Pictures and Associated Audio Information: Systems,xe2x80x9d ISO/IEC IS 13818-2, xe2x80x9cInformation Technology-Generic Coding of Moving Pictures and Associated Information: Video,xe2x80x9d and ISO/IEC IS 13818-3, xe2x80x9cInformation Technology-Generic. Coding of Moving Pictures and Associated Audio Information: Audio,xe2x80x9d incorporated herein by reference. A concise introduction to MPEG is given in xe2x80x9cA guide to MPEG Fundamentals and Protocol Analysis (Including DVB and ATSC),xe2x80x9d Tektronix Inc., 1997, incorporated herein by reference.
Splicing of audio/visual programs is a common operation performed, for example, whenever one encoded television program is switched to another. Splicing may be done for commercial insertion, studio routing, camera switching, and program editing. The splicing of MPEG encoded audio/visual streams, however, is considerably more difficult than splicing of the uncompressed audio and video. The P and B frames cannot be decoded without a preceding I frame, so that cutting into a stream after an I frame renders the P and B frames meaningless. The P and B frames are considerably smaller than the I frames, so that the frame boundaries are not evenly spaced and must be dynamically synchronized between the two streams at the time of the splice. Moreover, because a video decoder buffer is required to compensate for the uneven spacing of the frame boundaries in the encoded streams, splicing may cause underflow or overflow of the video decoder buffer.
The problems of splicing MPEG encoded audio/visual streams are addressed to some extent in Appendix K, entitled xe2x80x9cSplicing Transport Streams,xe2x80x9d to the MPEG-2 standard ISO/IEC 13818-1 1996. Appendix K recognizes that a splice can be xe2x80x9cseamlessxe2x80x9d when it does not result in a decoding discontinuity, or a splice can be xe2x80x9cnon-seamlessxe2x80x9d when it results in a decoding discontinuity. In either case, however, it is possible that the spliced stream will cause buffer overflow.
The Society of Motion Picture and Television Engineers (SMPTE) apparently thought that the ISO MPEG-2 standard was inadequate with respect to splicing. They promulgated their own SMPTE Standard 312M, entitled xe2x80x9cSplice Points for MPEG-2 Transport Streams,xe2x80x9d incorporated herein by reference. The SMPTE standard defines constraints on the encoding of and syntax for MPEG-2 transport streams such that they may be spliced without modifying the packetized elementary stream (PES) packet payload. The SMPTE standard includes some constraints applicable to both seamless and non-seamless splicing, and other constraints that are applicable only to seamless splicing. For example, for seamless and non-seamless splicing, a splice occurs from an Out Point on a first stream to an In Point on a second stream. The Out Point is immediately after an I frame or P frame (in presentation order). The In Point is just before a sequence header and I frame in a xe2x80x9cclosedxe2x80x9d GOP (i.e., no prediction is allowed back before the In Point).
As further discussed in Norm Hurst and Katie Cornog, xe2x80x9cMPEG Splicing: A New Standard for Televisionxe2x80x94SMPTE 312M,xe2x80x9d SMPTE Journal, November 1998, there are two buffering constraints for seamless splicing. The startup delay at the In Point must be a particular value, and the ending delay at the Out Point must be one frame less than that. Also, the old stream must be constructed so that the video decoder buffer (VBV buffer) would not overflow if the bit rate were suddenly increased to a maximum splice rate for a period of a splice decoding delay before each Out Point.
In accordance with a first aspect, the invention provides a method of seamless splicing of a first transport stream to a second transport stream to produce a spliced transport stream. The first transport stream includes video access units encoding video presentation units representing video frames, and audio access units encoding audio presentation units representing segments of a first audio signal. The second transport stream includes video access units encoding video presentation units representing video frames, and audio access units encoding audio presentation units representing segments of a second audio signal. The first transport stream has a last video frame to be included in the spliced transport stream, and the second transport stream has a first video frame to be included in the spliced transport stream. The method includes finding, in the first transport stream, an audio access unit that is best aligned with the last video frame from the first transport stream to be included in the spliced transport stream, and removing audio access units from the first transport stream that are subsequent to the audio access unit that is best aligned with the last video frame from the first transport stream. The method also includes finding, in the second transport stream, an audio access unit that is best aligned with the first video frame from the second transport stream to be included in the spliced transport stream, and removing audio access units from the second transport stream that are prior to the audio access unit that is best aligned with the first video frame from the second transport stream. The method further includes concatenating a portion of the first transport stream up to and including the last video frame to a portion of the second transport stream including and subsequent to the first video frame.
In accordance with another aspect, the invention provides a method of seamless splicing of a first transport stream to a second transport stream to produce a spliced transport stream. The first transport stream includes video access units encoding video presentation units representing video frames, and audio access units encoding audio presentation units representing segments of a first audio signal. The second transport stream includes video access units encoding video presentation units representing video frames, and audio access units encoding audio presentation units representing segments of a second audio signal. The first transport stream has a last video frame to be included in the spliced transport stream, and the second transport stream has a first video frame to be included in the spliced transport stream. The method includes computing differences between presentation times and corresponding extrapolated program clock reference times for the audio access units of the second transport stream in order to estimate the mean audio buffer level that would result when decoding the second transport stream. The method further includes concatenating a portion of the first transport stream up to and including the last video frame to a portion of the second transport stream including and subsequent to the first video frame, wherein presentation times for audio access units from the second transport stream are skewed in the spliced transport stream with respect to presentation times for video access units from the second transport stream in order to adjust the estimated mean audio buffer level toward a half-full audio buffer level when decoding the spliced transport stream.
In accordance with yet another aspect, the invention provides a method of seamless splicing of a first transport stream to a second transport stream to produce a spliced transport stream. The first transport stream includes video access units encoding video presentation units representing video frames. The video access units of the first transport stream encode the video presentation units using a data compression technique, and contain a variable amount of compressed video data. The second transport stream includes video access units encoding video presentation units representing video frames. The video access units of the second transport stream encode video presentation units using a data compression technique, and contain a variable amount of compressed video data. The first transport stream has a last video frame to be included in the spliced transport stream, and the second transport stream has a first video frame to be included in the spliced transport stream. Each of the video access units has a time at which the video access unit is to be received in a video decoder buffer and a time at which the video access unit is to be removed from the video decoder buffer. The method includes setting the time at which the video access unit for the first video frame of the second transport stream is to be removed from the video decoder buffer to a time following in a decoding sequence next after the time at which the last video access unit for the last frame of the first transport stream is to be removed from the video decoder buffer. The method also includes adjusting content of the first transport stream so that the beginning of the video access unit for the first video frame of the second transport stream will be received in the video decoder buffer immediately after the end of the video access unit for the last video frame of the first transport stream is received in the video decoder buffer. The method further includes concatenating a portion of the first transport stream up to and including the last video frame to a portion of the second transport stream including and subsequent to the first video frame.
In yet another aspect, the invention provides a method of seamless splicing of a first transport stream to a second transport stream to produce a spliced transport stream. The first transport stream includes video access units encoding video presentation units representing video frames, and audio packets including data of audio access units encoding audio presentation units representing segments of a first audio signal. The second transport stream includes video access units encoding video presentation units representing video frames, and audio packets including data of audio access units encoding audio presentation units representing segments of a second audio signal. The first transport stream has a last video frame to be included in the spliced transport stream, and the second transport stream has a first video frame to be included in the spliced transport stream. The method includes finding a plurality of (j) non-obsolete audio packets in the first transport stream following the video access unit of the last video frame in the first transport stream and to be included in the spliced transport stream, and finding a total of (k) null packets and obsolete audio packets in the second transport stream follow the video access unit of the first video frame of the second transport stream, where j greater than k. The method also includes replacing the null packets and obsolete audio packets with (k) of the non-obsolete audio packets. The method further includes concatenating a portion of the first transport stream up to and including the last video frame to a portion of the second transport stream including and subsequent to the first video frame to form the spliced transport stream, wherein the remaining jxe2x88x92k) audio packets are inserted in the spliced transport stream before the video access unit of the first video frame from the second transport stream.