1. Field of the Invention
The present invention relates to processing and storage of compressed visual data, and in particular the on-line encoding of MPEG data for storage, splicing, or other processing in a video server.
2. Background Art
It has become common practice to compress audio/visual data in order to reduce the capacity and bandwidth requirements for storage and transmission. One of the most popular audio/video compression techniques is MPEG. MPEG is an acronym for the Moving Picture Experts Group, which was set up by the International Standards Organization (ISO) to work on compression. MPEG provides a number of different variations (MPEG-1, MPEG-2, etc.) to suit different bandwidth and quality constraints. MPEG-2, for example, is especially suited to the storage and transmission of broadcast quality television programs.
For the video data, MPEG provides a high degree of compression (up to 200:1) by encoding 8×8 blocks of pixels into a set of discrete cosine transform (DCT) coefficients, quantizing and encoding the coefficients, and using motion compensation techniques to encode most video frames as predictions from or between other frames. In particular, the encoded MPEG video stream is comprised of a series of groups of pictures (GOPs), and each GOP begins with an independently encoded (intra) I frame and may include one or more following P-frames and B-frames. Each I frame can be decoded without information from any preceding and/or following frame. Decoding of a P frame requires information from a preceding frame in the GOP. Decoding of a B frame requires information from a preceding and following frame in the GOP. To minimize decoder buffer requirements, each B frame is transmitted in reverse of its presentation order, so that all the information of the other frames required for decoding the B frame will arrive at the decoder before the B frame.
In addition to the motion compensation techniques for video compression, the MPEG standard provides a generic framework for combining one or more elementary streams of digital video and audio, as well as system data, into single or multiple program transport streams (TS) which are suitable for storage or transmission. The system data includes information about synchronization, random access, management of buffers to prevent overflow and underflow, and time stamps for video frames and audio packetized elementary stream packets. The standard specifies the organization of the elementary streams and the transport streams, and imposes constraints to enable synchronized decoding from the audio and video decoding buffers under various conditions.
The MPEG-2 standard is documented in ISO/IEC International Standard (IS) 13818-1, “Information Technology-Generic Coding of Moving Pictures and Associated Audio Information: Systems,” ISO/IEC IS 13818-2, “Information Technology-Generic Coding of Moving Pictures and Associated Information: Video,” and ISO/IEC IS 13818-3, “Information Technology-Generic Coding of Moving Pictures and Associated Audio Information: Audio,” incorporated herein by reference. A concise introduction to MPEG is given in “A guide to MPEG Fundamentals and Protocol Analysis (Including DVB and ATSC),” Tektronix Inc., 1997, incorporated herein by reference.
Splicing of audio/visual programs is a common operation performed, for example, whenever one encoded television program is switched to another. Splicing may be done for commercial insertion, studio routing, camera switching, and program editing. The splicing of MPEG encoded audio/visual streams, however, is considerably more difficult than splicing of the uncompressed audio and video. The P and B frames cannot be decoded without a preceding I frame, so that cutting into a stream after an I frame renders the P and B frames meaningless. The P and B frames are considerably smaller than the I frames, so that the frame boundaries are not evenly spaced and must be dynamically synchronized between the two streams at the time of the splice. Moreover, because a video decoder buffer is required to compensate for the uneven spacing of the frame boundaries in the encoded streams, splicing may cause underflow or overflow of the video decoder buffer.
The problems of splicing MPEG encoded audio/visual streams are addressed to some extent in Appendix K, entitled “Splicing Transport Streams,” to the MPEG-2 standard ISO/IEC 13818-1 1996. Appendix K recognizes that a splice can be “seamless” when it does not result in a decoding discontinuity, or a splice can be “non-seamless” when it results in a decoding discontinuity. In either case, however, it is possible that the spliced stream will cause buffer overflow.
The Society of Motion Picture and Television Engineers (SMPTE) apparently thought that the ISO MPEG-2 standard was inadequate with respect to splicing. They promulgated their own SMPTE Standard 312M, entitled “Splice Points for MPEG-2 Transport Streams,” incorporated herein by reference. The SMPTE standard defines constraints on the encoding of and syntax for MPEG-2 transport streams such that they may be spliced without modifying the packetized elementary stream (PES) packet payload. The SMPTE standard includes some constraints applicable to both seamless and non-seamless splicing, and other constraints that are applicable only to seamless splicing. For example, for seamless and non-seamless splicing, a splice occurs from an Out Point on a first stream to an In Point on a second stream. The Out Point is immediately after an I frame or P frame (in presentation order). The In Point is just before a sequence header and I frame in a “closed” GOP (i.e., no prediction is allowed back before the In Point).
As further discussed in Norm Hurst and Katie Cornog, “MPEG Splicing: A New Standard for Television—SMPTE 312M,” SMPTE Journal, November 1998, there are two buffering constraints for seamless splicing. The startup delay at the In Point must be a particular value, and the ending delay at the Out Point must be one frame less than that. Also, the old stream must be constructed so that the video decoder buffer (VBV buffer) would not overflow if the bit rate were suddenly increased to a maximum splice rate for a period of a splice decoding delay before each Out Point.
In the broadcast environment, frame accuracy is an important consideration whenever audio or digital video streams are spliced. If frame accuracy is not ensured, then desired frames will be missing from the spliced video stream, and undesired frames will appear in the spliced video stream. If frame inaccuracy accumulates, there could be serious schedule problems. The loss or addition of one or more frames is especially troublesome when commercials are inserted into program streams. Each commercial is a very short clip and the loss or addition of just a few frames can have a noticeable effect on the content of the commercial. More importantly, the loss or addition of just a few frames may result in a substantial loss of income from advertisers, because advertisers are charged a high price for each second of on-air commercial time.
In order to ensure frame accuracy in the broadcast environment, it is common practice to include a vertical interval time code (VITC) in the analog video waveform to identify each video field and frame or to use an external LTC (Longitudinal Time Code) synchronized to a house clock. The VITC occurs on a scan line during each vertical blanking interval. For digital video, each VITC can be digitized to provide a digital vertical interval time code (DVITC) for each video field and frame. The VITC and DVITC are used when the video source is a VTR. LTC is used when the video source is a satellite feed. For example, for a 525 line video system, each VITC can be digitized to an eight-bit value in accordance with SMPTE standard 266M-1994. Splicing operations can be triggered upon the occurrence of a specified VITC or DVITC value in an analog or digital video stream or from an LTC input.
Video streams are often encoded in the MPEG-2 format for storage in video server. In such a system, there are two encoder types that can be used: off-line and on-line. Off-line encoders are frame accurate and generate accurate files but they are controlled by external operators and not by the server. On the other hand, on-line encoders encode all the time and there is no external control of the location of an I frame. The I frames occur at fairly regular intervals, depending on the particular encoding procedure followed by the encoder. If the encoded MPEG stream is to be subdivided into clips, then the server must record complete GOPs. In other words, each clip must begin with the I frame of a GOP, and end with the last frame of a GOP. However, if a clip is to be used in a splicing operation and the In-point for the clip is not an I-frame in the clip or the Out-point is not the last frame of a GOP, then the splicing operation may require additional processing or result in undesirable visual artifacts or have the effect of introducing frame inaccuracy.
The encoded MPEG-2 clip could be decoded and re-encoded off-line so that the desired In-point and Out-point are valid and desirable splice points for seamless splicing, but such decoding and re-encoding requires significant processing time and resources. Seamless splicing techniques have been devised for splicing MPEG-2 clips without decoding and re-encoding, but these techniques have the effect of introducing some frame inaccuracy when delays are introduced to avoid video buffer (VBV) underflow or overflow. For example, with these seamless splicing techniques, if the In-point is a predicted frame instead of an I-frame then some delay may be introduced in the presentation time of the In-point in order to load the video buffer at least with the I frame upon which predicted frame is based. Moreover, if the Out-point is not the last frame of a GOP, then some delay may be introduced in the presentation time of the following frame in the spliced stream. These seamless splicing techniques are further disclosed in Daniel Gardere et al. U.S. Provisional Application Ser. No. 60/174,260, filed Jan. 4, 2000, entitled “Seamless Splicing of Encoded MPEG Video and Audio,” pending as U.S. Non-Provisional Application Serial No. 09/540,347 filed Mar. 31, 2000, and to be published as European Patent Application No. 00204717.3 filed 22 Dec. 2000. Since on-line encoders are being used more frequently in the broadcast environment, there is a need for ensuring better frame accuracy when MPEG-2 coded video from an on-line encoder is stored as a clip or otherwise prepared or used for splicing in a video server.