1. Field of the Invention
The present invention relates to data transmission, and, more particularly, to transmission of compressed video data.
2. Description of the Related Art
Video data consists of video data signals which represent individual video pictures or frames. Each frame may be a still image, but is usually part of a plurality of successive frames of video signal data that represent a motion video sequence.
In various video processing and broadcasting facilities, such as a TV station or studio, there is a need to transmit video signals from one part of the facility to another, to route video signals from selected video sources to selected video destinations or sinks. Video signals nay also be routed from one studio to another. There may be hundreds of video sources (such as the output of a video camera or satellite feed), and hundreds of video sinks (such as a video processor, monitor, video cassette recorder (VCR), or broadcasting unit) that receive video signals from one or more of the video sources. Such facilities also distribute other data signals such as audio signals.
Such systems often operate at a system frame rate, which is typically approximately 30 frames per second (fps). The NTSC standard, for example, operates at (30*1000/1001)≈29.97 fps (referred to subsequently herein for simplicity as 30 fps). Each frame is typically composed of an even field interlaced or interleaved with an odd field. Accordingly, NTSC cameras output 60 fields of analog video signals per second, which includes 30 even fields interlaced with 30 odd fields, to provide video at 30 fps. In such a system, the time required to carry one frame across a given communication path or channel of the system is constant and is the reciprocal of the frame rate: {fraction (1/30)} sec.
There is often a need to switch video signal sources when engaged in operations such as commercial insertion, promo insertion, studio routing, camera switching, tape editing, and the like. In an NTSC system as described above, which does not employ compressed video data, it is relatively simple to switch the input of a given video sink from a first video signal source to a second video signal source. Typically, this is done by switching from one source to another and have the resulting image make a clean cut from one stream to the next. Such switching typically takes place in the vertical interval of the video signal. The associated audio signal is usually also switched simultaneously.
In some newer video facilities (e.g., TV, stations), digital video signals are used to represent video images. The digital video data consists of a compressed or encoded digital bitstream, various segments of which represent a given frame or field. Each frame segment or portion of the compressed video bitstream thus contains compressed data bits that represent the frame. The compressed video bitstream itself represents a sequence of frames (images). In the International Standards Organization (ISO) ISO/IEC 11172 Moving Pictures Experts Group-1 standard (MPEG-1), for example, the display rate and average transmission rate is 30 pictures/second. For the ISO/IEC 13818 (MPEG-2) standard, the display rate and average transmission rate can be 30 frames/second. The MPEG standards support other frame rates as well, including 29.97 and the PAL standard of Europe. In the MPEG standards, the term xe2x80x9cpicturexe2x80x9d refers to a bitstream of data which can represent either a frame of data (i.e., both fields), or a single field of data. In the present application, the general term xe2x80x9cframexe2x80x9d will be used for simplicity of explanation.
In such systems, there may be different picture or frame types in the compressed digital stream, such as I frames, P frames, and B frames. I frames, or intra-frames, are self-contained, that is, they are not based on information from previously transmitted and decoded frames. Video frames which are encoded with motion compensation techniques are referred to as predicted frames, or P frames, since their content is predicted from the content of previous I or P frames. P frames may also be utilized as a base for a subsequent P frame. I and P frames are both xe2x80x9canchorxe2x80x9d frames, since they may be used as a basis for other frames, such as B or P frames which are predicted based on anchor frames. A xe2x80x9cbidirectionalxe2x80x9d or B frame is predicted from the two anchor frames transmitted most recently relative to the transmission of the B frame. However, because the B frames are typically sent out of order (late), one of the two anchor frames used by a B frame is after the B frame in display order, although it must of course be received by the decoder before the B frame is reconstructed.
I frames typically are the largest in terms of number of encoded bits per frame, while B frames are the smallest, and P frames are somewhere in between. I frames may take many frame times to send (at standard bit rates), while P and B frames often take only a fraction of a frame time. I, P, and B frames are utilized in coding standards such as MPEG-1, while other standards, such as H.261 (Pxc3x9764), developed by the International Telegraph Union (ITU), utilize only I and P frames.
An encoder at the source end receives unencoded video frame data and compresses the data to provide the compressed digital bitstream. A decoder at the receiving end receives and decompresses (decodes) the data, so that it is put into more useful form, for example for display on a monitor. Referring now to FIG. 1, there is shown a prior art compressed video data transmission system, which includes encoder 110, transmission channel 120, and decoder 130. Encoder 110 receives unencoded video frames from a video frame source and is itself a video data source since it provides a compressed video bitstream. Encoder 110 includes a video buffering verifier (VBV) 111. Decoder 130, a video sink, comprises buffer 131 and is coupled to a display device such as monitor 132. The capacity or bandwidth of channel 120 is sufficient to transmit 30 fps on average.
Such a system may be referred to as a constant bit rate (CBR) system. A group of pictures (GOP), which includes at least one I frame, and optionally, a number of B and P frames, is typically transmitted as part of a compressed bitstream by encoder 110 across channel 120 as illustrated in FIG. 2. As can be seen in video data transmission sequence 200 of FIG. 2, in a CBR system such as system 100, although the size of the bitstream for each consecutive frame may vary, the average size of the frames is equal to the maximum average frame size achievable via channel 120, so that the entire channel capacity is used to maximize picture quality and to avoid over- or underflowing buffer 131. Overflowing or underflowing buffer 131 is referred to as a xe2x80x9cbuffer exception.xe2x80x9d Overflow is usually worse than underflow, because underflow requires the decoder to wait for more data to arrive, which usually means a momentary freeze frame and thus some slight temporal distortion, while overflow causes data to be lost, which may result in the loss of all subsequent pictures in the GOP until the next I frame arrives.
Unlike systems in which the time required to transmit a frame from a given source to a given destination is constant (e.g. {fraction (1/30)}s), in a compressed data system such as system 100 the time required to carry the frame or bitstream segment for each frame across channel 120 can vary from great to small from one frame to the next, because the size of the encoded bitstream can vary with the type of frame and also depending on the complexity of the scene information in that frame, how successful the motion compensation block matching was for a given predicted frame, and so on. Unencoded video frame data in is provided on input line 115 to encoder 110 at a constant time per frame (e.g., one frame each {fraction (1/30)} second), and decoded video frames are provided as video out on output line 135 to monitor 132 at the same constant frame rate. However, the time per frame for the compressed or encoded bitstream via channel 120 varies, with an average of 30 fps. This means that the frame boundaries in the compressed bitstream occur at irregular intervals, although these intervals occur 30 times per second on average. The bitrate or bandwidth of channel 120 is sufficient to support the output of encoder 110. Decoder 130 must display a new frame on monitor 132 every {fraction (1/30)} second, despite the fact that the frame data is arriving via channel 120 at varying intervals.
Referring once more to FIG. 2, video data transmission sequence 200 illustrates the order of transmission of an exemplary GOP used in video system 100. Sequence 210 shows the transmission timing of frames in a GOP of the compressed bitstream transmitted on channel 120. Sequence 220 shows the display timing of the received and decoded frames, which are displayed on monitor 132. Each frame is labeled I, B, or P according to its picture type, with a subscript indicating the display order. Frames in display sequence 220 are displayed at regular intervals, e.g. one frame for each {fraction (1/30)} second interval. The frame boundaries of transmission sequence 210 occur at irregular intervals.
Each P frame is predicted based on the most recent, previous anchor (I or P) frame. Each B frame is predicted based on the most recent two anchor frames transmitted. Thus, for example, frame I2 is transmitted first, since it is used for predicting frames B0, B1, and P5, but is displayed after decoded frames B0, B1. B frames are typically displayed immediately after receipt. Frames B0, B1 are each predicted based on previously-transmitted (but subsequently displayed) frame I2, as well as the last P frame of the previous GOP. Thus, the GOP of sequence 200 is an xe2x80x9copenxe2x80x9d GOP. xe2x80x9cClosed GOPsxe2x80x9d may also be used, which are self-contained GOPs, in which none of the B frames needs an anchor frame from another GOP for prediction purposes.
There is a delay between the beginning of transmission of the bitstream portion corresponding to a given frame and the beginning of its display by decoder 130. Various delays are indicated by horizontal portion of lines D0, D1, etc. in FIG. 2. The various delays vary from frame to frame, because the frame boundaries of the compressed bitstream (transmission sequence 210) vary. Thus, for example, delay D3 may be shorter than delay D4, because transmitted frame B3 is shorter in bit size or length than frame B4, while the display interval for frames B3 and B4 does not vary. Delay D5, the delay for P frame P5, may be longer or shorter than the delay for other P frames.
Thus, decoder bitstream buffer 131 is employed, to reconcile the difference arising from the frame boundaries being unevenly spaced at the input to decoder 130 but evenly spaced at its output. Buffer 131 can therefore hold enough frames data so that at regular intervals decoder 130 has available to it all the data it needs to create the next image to display. Encoder 110 itself constructs its output bitstream carefully to avoid overflowing or underflowing this buffer. This is done by employing a model of a theoretical decoder""s buffer, called the VBV. The VBV is a bitstream constraint, not a decoder specification. Encoder 110 encodes data so as not to over- or underflow the VBV buffer. Decoder designers depend on the fact that the bitstreams their decoders receive will be thus constrained, which allows them to design a practical decoder buffer such as decoder 130. If decoder 130 meets the constraints built into the compressed bitstream, then just as VBV 111 is not overflowed or underflowed, the actual buffer 131 of decoder 130 will not be overflowed or underflowed.
At any given time, the hypothetical buffer of VBV 111 has a certain VBV xe2x80x9coccupancyxe2x80x9d or fullness. Using this information, encoder 110 can vary the encoding parameters (such as degree of compression or quantization, for example) to increase or decrease the size of the encoded bitstream for the current or subsequent frames to ensure that the hypothetical VBV buffer will not overflow or underflow. Because decoder 130 is designed with the VBV constraints in mind, it also will not over- or underflow so long as the hypothetical buffer of VBV 111 does not. In this manner, the VBV fullness or occupancy will correspond to the past encoding history of the bitstream, which is also being received by actual buffer 131, which thus has its own fullness that corresponds, with some delay, to that. of the VBV.
As in uncompressed video systems, there is a need to switch compressed video bitstream sources applied to a given video data sink. Thus, the encoded or compressed bitstream coming from a first encoder or other video source should be stopped and switched to that coming from a second encoder or video source, and applied to the given decoder or video sink. This process is sometimes referred to, e.g. in the MPEG standards, as xe2x80x9csplicingxe2x80x9d the two video bitstreams. The point at which the first or old input bitstream is switched over to the new or second input bitstream may be referred to as a xe2x80x9csplice point.xe2x80x9d Bitstream splicing is described in further detail in Norm Hurst and Katie Comog, xe2x80x9cMPEG Splicing: A New Standard for Television,xe2x80x9d SMPTE Journal, Vol. 107, No. 11, November 1998, p. 978, the entirety of which is incorporated herein by reference.
However, it can be difficult to splice two compressed video bitstreams, such as MPEG bitstreams, for several reasons. First, P and B frames cannot be reconstructed by a decoder such as decoder 130 without it having received and decoded the preceding I or P frame. Thus, cutting xe2x80x9cinxe2x80x9d to a bitstream after an I frame renders the subsequent P and B frames meaningless. Additionally, the B frames are sent out of order, which makes splicing even more difficult. Cutting in at the wrong point can xe2x80x9ccut offxe2x80x9d some B frames, for example, from the anchor frame needed to reconstruct the B frame, and thus create xe2x80x9cholesxe2x80x9d in the frame sequence displayed on monitor 132.
Second, unlike uncompressed video, frame boundaries in the bitstream are not evenly spaced. Synchronizing frame boundaries to splice two streams is a problem that must be solved dynamically at the time of the splice. This can be very difficult or impossible in some cases. This problem can also lead to buffer overflow or underflow problems. Because frame boundaries are unevenly spaced at the input to a decoder, as described above with reference to FIG. 2, bitstream buffer 131 and VBV 111 are employed at the decoder and encoder ends, respectively. Although each compressed bitstream is carefully constructed to avoid overflowing or underflowing the VBV, switching to another stream could easily cause one of the buffer exceptions of overflow or underflow of buffer 131 to occur. The reason for this potential xe2x80x9cbuffer managementxe2x80x9d problem is that the status of decoder buffer 131 at any given time is a function of the entire history of the bitstream since decoder 130 began decoding that stream. The bitstream up until the splice point is generated by the first encoder 110 which keeps track of the bitstream status with VBV 111.
When decoder 130""s input is switched from one video bitstream source (encoder 110) to another, the second bitstream is being generated by an encoder that does not have the same history as the bitstream from encoder 110. Thus the second encoder""s own VBV will have a different fullness, for example, than that of VBV 111 of encoder 10. The subsequent encoding of the bitstream, after the splice point, will not be based on the bitstream history being received by decoder 130 up until that point. In effect, the bitstream constraints that normally govern the characteristics of the compressed bitstream, on which decoder 130 is relying, are violated when the second bitstream is simply substituted for the first.
If, for example, the second encoder""s VBV indicates that the hypothetical decoder buffer is more full than it really is (i.e., if the second encoder""s VBV buffer fullness is higher at the splice point than VBV 111 was at that point, due to their different bitstream histories), then at some later time the second encoder may allocate more bits to a frame than are really available in the hypothetical decoder buffer (and also actual decoder buffer 131), causing underflow of decoder buffer 131 and thus causing a momentary freeze frame. The reason for this is that if the decoder buffer is believed to be more full, then an encoder erroneously assumes it can transmit more bits per frame since these can be emptied from the decoder buffer without causing underflow.
Similarly, if the second decoder""s VBV indicates decoder buffer 131 is emptier than it really is, then the second encoder may allocate fewer bits to a frame than it is able to. This can cause too many frames to be transmitted on average, which can cause decoder buffer 131 to overflow because decoder 130 does not draw down or empty the decoder buffer quickly enough since each frame is too small in size. Overflow can cause data to be lost, as described above, which can have catastrophic results. Underflow or overflow may thus cause some frames to be skipped, or may produce temporal distortion or other undesirable results, such as diminished picture quality. Thus, bitstream splicing can give rise to various artifacts and other undesirable effects.
One way to splice bitstreams to avoid these problems is to carefully control the two bitstreams and their video content so that at the time that a splice is desired, the two encoders have xe2x80x9csynchronizedxe2x80x9d bitstreams (the same VBV occupancy). This approach is not always feasible or available, however. For example, there may be a need to switch to a second video source which is not already synchronized with the existing encoder and decoder.
In a data transmission system, an encoder provides a bitstream having a sequence of frame segments, wherein each frame segment has a bit size which is limited by the encoder to a maximum bit size. A transmission channel is provided which has a channel rate sufficient to transmit a frame segment of maximum bit size within a frame interval. A burst transmitter receives the bitstream from the encoder and transmits each consecutive frame segment, at regular frame intervals, in a burst at the channel rate via the transmission channel, to provide a bursty bitstream over the transmission channel.