In a broadcast environment, it is some times desirable to encode (compress) video signals of multiple video programs in real time and then multiplex or combine the encoded video signals together. The combined encoded video signals are then broadcast to one or more receivers which are capable of demultiplexing out a desired one of the video programs, including the desired encoded video signal. The receiver then decodes the video signal (and possibly associated audio signal(s), an associated closed captioned text signal, a private data signal, etc.) and presents (displays) the decoded video signal.
Video signals are preferably encoded using an encoding technique such as MPEG-1 or MPEG-2. Such encoding techniques produce a variable amount of encoded data for each picture (frame or field) of the video signal. The amount of encoded data produced for each picture depends on a number of factors including the amount of motion between the to-be-encoded picture and other pictures used as references for generating predictions therefor. For example, a video signal depicting a football game tends to have high motion pictures and a video signal depicting a talk show tends to have low motion pictures. Accordingly, the average amount of data produced for each picture of the football game video signal tends to be higher than the average amount of data produced for each picture of comparable quality of the talk show. The allocation of bits from picture to picture or even within a picture may also be controlled to generate a certain amount of data for that picture. Consider that the amount of data for each picture may vary. However, the buffer at the decoder has a finite storage capacity. When encoding a video signal, a dynamically adjusted bit budget may be set for each picture to prevent overflow and underflow at the decoder buffer given the transmission bit rate, the storage capacity of the decoder buffer and the fullness of the decoder buffer over time. Note that varying the number of bits that can be allocated to a picture impacts the quality of the pictures of the video signal upon decoding.
In general, the transmission medium over which the multiplexed encoded video signals are transmitted has a finite transmission bit rate. It is desirable to share this transmission bit rate amongst the different video signals that are multiplexed together. One manner of doing so is to simply allocate fixed sized fractions of the total transmission capacity to each video signal. However, as noted above, the amount of data produced for each picture of each video signal tends to vary depending on the content thereof and from moment to moment. This would tend to produce low motion video signals with unnecessarily high quality and high motion video signals with poor quality.
A preferred real-time video encoding system 10 is shown in FIG. 1. This video encoding system 10 is described in greater detail in U.S. patent application Ser. No. 08/775,313. As shown, digital video signals are produced from k&gt;1 sources 12-1, 12-2, . . . , 12-k. The video sources 12-1 to 12-k can be video tape recorders, magnetic or optical discs, cameras or the like. Each digital video signal is received at a respective encoder 14-1, 14-2, . . . , 14-k. Each encoder 14-1 to 14-k encodes the video signal received thereat and outputs an encoded video signal to the multiplexer 16. The multiplexer 16 multiplexes the encoded video signals together to produce an output signal.
In the encoding of the video signals, each encoder 14-1 to 14-k can generate statistical data regarding the complexity of encoding its respective video signal. Such complexity statistics can be a priori (pre-encoding) statistics and/or a posteriori (or post encoding) statistics. Examples of such statistics include measures of inter-pixel differences or the actual number of bits needed to encode a picture.
These statistics are outputted from each encoder 14-1 to 14-k to a statistics computer 18. The statistics computer 18 uses the measure of encoding complexity of each encoder 14-1 to 14-k as a basis to allocate a fraction of the transmission bit rate of the transmission channel to each encoder 14-1 to 14-k, e.g., so as to equalize the picture quality over all of the encoders 14-1 to 14-k. Thus, an encoder 14-1 which encodes a video signal with a high encoding complexity can be allocated a higher bit rate than an encoder 14-2 which encodes a video signal with a low encoding complexity. This tends to equalize the quality of all of the encoded video signals that are multiplexed together. To allocate the bit rates, the statistics computer 18 can transfer an indication of a bit rate to each encoder 14-1 to 14-k. Each encoder 14-1 to 14-k responds to an indication of an allocated bit rate by accordingly adjusting the number of bits produced for each picture in an effort to meet the allocated bit rate. Preferably, statistics are provided periodically from the encoders 14-1 to 14-k to the statistics computer 18 and indications of periodically allocated bit rates are transferred periodically from the statistics computer 18 to the encoders 14-1 to 14-k.
As noted above, each encoder 14-1 to 14-k encodes each picture in order to generate a certain number of bits for that picture according to a bit budget for that picture. Furthermore, the bit budget is set to prevent a decoder buffer underflow or overflow given a certain transmission channel bit rate. In order to prevent decoder buffer underflow and overflow, the encoder models the decoder buffer in order to determine the fullness of the decoder's buffer from time to time. The behavior of the decoder buffer is now considered in greater detail.
FIG. 2 illustrates a model of a decoder buffer for a sequence of pictures. A sequence of pictures is assigned a picture type, namely, intracoded or I, predictively coded or P or bidirectionally predictively encoded or B. I pictures are spatially only encoded. P pictures are temporally encoded and spatially encoded wherein predictions for encoding P pictures originate from only previous P or I reference pictures. B pictures are temporally and spatially encoded wherein predictions for B pictures may originate from previous and/or subsequent I or P reference pictures. Predictions must be obtained from decoded, reconstructed versions of the reference I or P pictures according to the MPEG-2 standard. (This ensures that the encoder uses the same prediction as is available to the decoder.) As such, the encoding of each B picture, e.g., pictures B0 and B1, is delayed until the subsequent reference picture, namely, I2, is encoded, even though such a reference picture is presented (displayed) later. Pictures are decoded in the same order that they are encoded.
In modeling the decoder buffer, the encoder determines the buffer fullness of the decoder buffer. The encoder can know how many bits are present in the decoder buffer given the allocated transmission channel bit rate at which such pictures are transmitted to the decoder buffer, the delay between encoding a picture at the encoder and decoding a picture at the decoder, and the knowledge that the decoder buffer is assumed to remove the next to be decoded picture instantaneously at prescribed picture intervals. For example, as depicted, at time interval A, the allocated bit rate is R1 bits/second, at time interval B, the bit rate is R2 bits/second and at time interval C, the allocated bit rate is R3 bits/second. The number of bits produced for each picture I2, B0, B1, P5, B3, B4, P8, B5, B6, P11, B9, B10 and I14, is b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12 or b13, respectively. The encoder attempts to determine each maxima and minima of the decoder buffer's fullness which correspond to the number of bits in the buffer immediately before the decoder removes a picture and the number of bits in the buffer immediately after the decoder removes a picture, respectively. Given such information, the encoder can determine the number of bits to allocate to successive pictures to prevent decoder buffer underflows (decoder buffer does not have all of the bits of a picture in time for the decoder to decode them at a predefined decode time) or overflows (decoder buffer fullness exceeds the maximum decoder buffer storage capacity of B.sub.max bits).
As shown in FIG. 2, the encoder typically further restricts the number of bits produced during encoding to prevent the decoder buffer fullness from falling below a threshold b.sub.lo or exceeding a threshold b.sub.hi. The reasons for this pertains to inaccuracies in the encoder's model of the decoder's buffer fullness, for example, as caused by a variation in the delay between encoding each picture and decoding each picture. Such variations can occur when the original source video signal contains repeat fields, as occurs when the video signal is produced from film using the 3:2 pull-down technique. Specifically, to match the film rate of 24 frames per second to the NTSC video signal rate of 60 fields per second, some (approximately every other) film frame is converted to three fields instead of two, where the third field is a duplicate or repeat of the first field of that film frame.
According to MPEG-2, repeated fields can be entirely eliminated from the encoded video signal and substituted with a flag (called the "repeat.sub.-- first.sub.-- field" flag) which causes the decoder to repeat a designated field of the decoded, reconstructed video signal. FIG. 3 illustrates an illustrative encoder 14 for encoding a video signal that can include repeat fields. A video signal outputted from a video source 12 is processed by a inverse teleciner 21 to detect and discard repeat fields. Next, a frame organizer and type selector 23 determines whether each frame is an I frame, P frame or B frame, aggregates adjacent non-repeated fields into frames, and reorders the frames according to the appropriate encoding order. Finally, a compressor 25 compresses the video signals according to the selected order. Illustratively, the inverse teleciner 21, frame organizer and type selector 23 and compressor 25 are implemented using one or more processors, such as the DV Expert.TM. encoder distributed by C-Cube Microsystems, Inc..TM., a company located in Milpitas Calif. Such a processor actually includes multiple processing sections, such as a RISC processor, a motion estimator, and a video digital signal processor, on a single integrated circuit. A single such integrated circuit, or multiple integrated circuits of this type working in concert, may be used to perform such processing.
FIG. 4 illustrates a sample timing relationship between capture (i.e., input) of the unencoded digital video signal at the encoder 14 (more specifically to the inverse teleciner 21), repeat field detection by the inverse teleciner 21 and encoding by the compressor 25. As shown, a sequence of 40 fields is outputted from the video source 12 labeled 0 to 39. Using one of a number of well known techniques, the captured fields are processed to identify repeat fields. As indicated by letters "N", fields 2, 4, 6, 8, 10, 15, 20, 25, 30, 35 and 40 are not detected as repeat fields. As indicated by letters "Y", fields 12, 17, 22, 27, 32 and 37 are detected as repeat fields. Adjacent pairs of fields are combined into frames as indicated, except in the case that a repeat field is detected. In such a case, the repeat field is discarded, i.e., not encoded.
The discarding of repeat fields allows the encoder to increase the number of bits available for allocation to the remaining pictures (or allows reducing the bit rate allocated to the encoded video signal for a given quality). In place of the discarded repeat field, the encoder sets the repeat.sub.-- first.sub.-- field flag. The decoder decodes the encoded frames from the encoded video signal and, in response to detecting the set repeat.sub.-- first.sub.-- field flag, simply repeats display of an appropriate one of the fields of the previously decoded and reconstructed frames.
The encoder must pause for one field time for every discarded repeat field so that the encoder does not run out of pictures to encode. MPEG-2 does not specify precisely when pausing should occur and conventional encoders tend to pause at different times. According to the technique shown in FIG. 4, as soon as the encoder detects that the next to-be-encoded frame precedes a repeat field, the encoder encodes the non-repeated fields of the frame and then pauses encoding for one field time. For example, as shown in FIG. 4, frame I2 is encoded, followed immediately by encoding frames B0, B1 and P5. However, because the field immediately following frame P5 is a repeat field (and therefore is discarded), the encoder pauses for one field time before resuming encoding of frame B3. Likewise, after encoding frame B3, the encoder immediately encodes frame B4. However, because a repeat field is detected immediately following frame B7 while encoding frame B4, the encoder pauses for one field time after encoding frame B4. As shown, encoding pauses after each of frames P5, B4, B6, P11, B10, and B12. This manner of pausing the encoding operation is referred to herein as the immediate stall technique. The encoder in FIG. 4 has a single frame pipeline because only a single frame time is needed for a frame to complete processing in the compressor 25. Thus, this encoder is more precisely referred to as an immediate stall/single stage pipeline encoder.
FIG. 5 illustrates the timing associated with capture, repeat field detection and encoding for a three frame pipeline encoder. In this encoder, two successive motion estimation search stages or steps ME1 and ME2 are performed successively on each frame, followed by a final encoding stage. Each of the motion estimation search stages ME1 and ME2 (nominally) requires one frame time to complete for each frame, and the final encoding stage requires one frame time. As such, each frame requires three frame times to complete processing in the compressor 25 portion of the encoder. Each stage ME1, ME2 and the final encoding stage simultaneously pause encoding for one field time immediately upon detecting a repeat field. However, this corresponds to different frames at each stage. For example, upon detecting a repeat field following the frame P5, the stage ME1 immediately pauses for one field time. The stage ME2 also pauses at the same time. However, the frame ME2 pauses after processing the immediately preceding frame B1. Likewise, the final encoding stage also pauses at the same time as the stages ME1 and ME2. However, this corresponds to the time immediately following the processing of the frame B0 in the final encoding stage. As such, the one field pauses are shifted back in the encoded sequence of frames by one frame time for each additional stage (or a total of two frame times) in comparison to the encoding pauses shown in FIG. 4. Thus, using the same repeat field detection pattern, the encoding pauses after frames, B0, P5, B4, B6, P11 and B10 for the immediate stall/three frame pipeline encoder.
FIG. 6 illustrates the capture, repeat field detection and encoding timing relationship for a single frame pipeline encoder employing a delayed stall manner of encoding. In this encoder, encoding does not pause immediately upon detecting a repeat field but rather is delayed. Specifically, upon detecting a repeat field, encoding of frames continues until the next to-be-encoded reference frame (P frame or I frame). As may be appreciated, this corresponds to the moment in time at which the encoder exhausts all to-be-encoded frames that have completed inverse telecine processing. The encoding then pauses one field time for each repeat field detected between reference frames. For example, using the same repeat field sequence as in FIGS. 4 and 5, a repeat field is detected following frame P5. However, encoding does not pause. Rather, previously inverse telecine processed, reordered B frames B3 and B4 are encoded. Note that while encoding frame B4, yet another repeat field is detected following frame B7. As such, immediately before encoding frame P8, encoding pauses for two field times, i.e., one field time for each of the two detected repeat fields following frame P5 and frame B7. Such a pausing is needed to complete inverse telecine processing of fields 18 and 19 of frame P8. Encoding then continues for frames P8, B6 and B7. Note that while encoding frame B6, another repeat field is detected following frame B9. Nevertheless, encoding continues and does not pause until immediately before encoding frame P11. Again, the pausing is furthermore needed to complete inverse telecine processing of field 26 of frame P11 so that frame P11 is available for encoding.
The behavior of the delayed stall encoder can be analyzed as follows. Each frame is encoded as soon as possible. Any discarded repeat fields that delay capture of a reference frame delays encoding of such a reference frame. The encoding of B frames, on the other hand, is delayed only as is necessary to encode the subsequent reference frame.
FIG. 7 illustrates the capture, repeat field detection and encoding timing of a delayed stall/three frame pipeline encoder. As in the delayed stall/single frame pipeline encoder (the behavior of which is described in FIG. 6), when a repeat field is detected, encoding does not pause immediately. Rather, any available frames are encoded. Pausing occurs only inasmuch as is needed to obtain the data of the next reference frame. This same behavior occurs at each stage. That is, upon detecting the first repeat field following frame P5, the ME1 stage continues to process available frames. Nor does detecting a repeat field after frame B7 pause processing at the ME1 stage. Rather, processing continues in the ME1 stage until after the frame B4 at which point the ME1 stage pauses until the fields 18 and 19 of the next to-be-encoded reference frame, namely, the reference frame P8, have completed inverse telecine processing. This requires two field times as shown. The same behavior is performed by the ME2 search stage. Specifically, processing does not pause immediately upon detecting repeat fields following frames P5 or B7 but rather continues until the stage ME2 must wait for data to be available, i.e., when the ME1 stage has completed processing the frame P8. As noted, the ME1 stage pauses (in this case, for two field times) prior to processing the frame P8 which in turn causes the ME2 stage to pause, albeit, at a different point in time than the ME1 stage, until the frame P8 is available for processing. The same is true for the final encoding stage. As such, encoding pauses at the same pictures and for the same durations in the delayed stall/three frame pipeline encoder as in the delayed stall/single frame pipeline encoder.
FIG. 8 illustrates the timing associated with decoding and presentation of pictures at a decoder. As shown, the frames are decoded in the order I2, B0, B1, P5, B3, B4, . . . etc. A real-time decoder is capable of decoding each frame in one frame time. To reduce memory requirements, and to also enable separate display of each field of each frame, the decoder preferably begins display of a B frame about halfway through decoding of the B frame. On the other hand, reference frames, namely P and I frames, are not displayed until about half of the very next to-be-decoded reference frame is decoded. When displaying a repeat field, the decoder will pause decoding.
This behavior is demonstrated in FIG. 8. First, frame I2 is decoded. Next, frames B0 and B1 are decoded using I2 as a reference picture. Presentation of frame B0 begins when about half of the frame B0 is decoded. Likewise presentation of frame B1 begins when about half of frame B1 is decoded.
Next, frame P5 is decoded. At the time that presentation of frame B1 is complete, half of frame P5 is decoded. Thus, presentation of frame I2 can begin. After this, frames B3 and B4 are decoded using frames I2 and P5 as references. As above, presentation of the frame B3 begins when half of frame B3 is decoded and presentation of frame B4 begins when about half of frame B4 is decoded.
Next frame P8 is decoded. At the completion of presentation of frame B4, about half of frame P8 has been decoded. As such, presentation of frame P5 begins. Frame P5 includes a set repeat.sub.-- first.sub.-- field flag for causing the repeated display of field 10 as field 12. When field 10 is displayed during the field time for field 12, decoding pauses until the display of field 10 in the field time of field 12 is complete. Decoding then resumes with frames B6 and B7 using frames P5 and P8 as references. Frames B6 and B7 are presented, wherein frame B7 has a set repeat.sub.-- first.sub.-- field flag causing field 15 of frame B7 to be displayed a second time during the field time for field 17. Again this causes the decoder to pause decoding for one field time, namely, during the field time for field 17.
The net result is that seamless presentation of decoded, reconstructed video frames and fields are achieved. In this example, decoding pauses after each of frames P8, B7, B9, P14 and B13 for one field time.
Compare now the encoding timing of the encoders shown in FIGS. 4-7 with the decoding timing shown in FIG. 8. None of the conventional encoders always pauses its encoding in between precisely the same frames as does the decoder.
It is not a requirement of MPEG-2, but nevertheless desirable for sake of modeling the decoder buffer, for the delay between encoding and decoding to be constant. (Note that even when the transmission rate is constant, the number of bits in each picture will vary. As such, the number of pictures buffered at the encoder will vary over time as will the number of pictures buffered at the decoder.) However, since conventional encoders do not pause encoding when repeat fields are detected in between the same frames as the decoders pause decoding while repeating corresponding fields, the delay between encoding and decoding individual frames varies. Note that the delay between encoding and decoding will remain constant if repeat fields are never detected.
For example, FIG. 9 shows the encoding and decoding timing relationship assuming that the video frames are encoded using the immediate stall/three frame pipeline encoder of FIG. 5. Suppose that the delay between encoding a picture and decoding that same picture will be n field times (n being a real number &gt;0) if repeat fields are never detected. Because no repeat fields are detected through the encoding of picture I2, the delay between the encoding and decoding of frame I2 is n field times. The same is true for the frame B0. However, there is a one field delay between encoding frame B0 and encoding frame B1 but no delay between decoding these two frames. As such, the delay between encoding frame B1 and decoding frame B1 is n-1 fields. The encoding to decoding delay for frame P5 is also n-1 fields. The encoder pauses again for one field time between encoding frame P5 and encoding frame B3. However, the decoder does not pause at this same point in the sequence of frames. Thus, the encoding to decoding delay for the frame B3 is n-2 fields. The encoding to decoding delay for frame B4 is also n-2 fields. After encoding field B4, encoding pauses for another field time. Again, decoding does not pause between decoding frames B4 and P8 and thus the encoding to decoding delay for frame P8 is n-3 fields. Finally, the decoder pauses between decoding frame P8 and decoding frame B6. There are no pauses in encoding between these frames. Thus, the encoding to decoding delay for frame B6 is only n-2 fields. In short, the encoding to decoding delay using the aforementioned immediate stall/three frame pipeline encoder varies between n and n-3 fields. More generally stated, if the spacing between reference pictures is M pictures, and the number of stages in the encoder pipeline is S, then the encoding to decoding delay variation is n to n-r(M+S-1), where r(y) is the maximum number of times the encoder will set the repeat.sub.-- first.sub.-- field flag in y consecutively captured frame pictures. Although the MPEG-2 standard allows for the repeat.sub.-- first.sub.-- field flag to be set every frame (r(y)=y), a typical encoder will not set the repeat.sub.-- first.sub.-- field flag in any two consecutively captured frames. This is because the conventional 3:2 pull-down process adds one repeat field every other frame. In this latter case, the variation in delay will be between n and n-.left brkt-top.(M+S-1)/2.right brkt-top. fields (where ".left brkt-top.x.right brkt-top." denotes the "ceiling of x," i.e., x if x is an integer and the integer portion of x+1 otherwise). In the above example, M=3 and S=3 and thus the encoder to decoder delay is n to n-3 fields. However, in an encoder that can produce an arbitrary repeat.sub.-- first.sub.-- field pattern, the variation may be as many as M+S-1 fields, namely, 5 fields for M=S=3.
FIG. 10 shows the timing relationship between the delayed stall/single frame pipeline encoder or delayed stall/three frame pipeline encoder shown in FIGS. 6-7. The derivation of the encode to decode delays is only briefly described here. Specifically, encoding pauses for two field times between frames B4 and P8 but decoding does not pause until after decoding frame P8 (and then pauses for only one field time). Thus, while the encoding to decoding delay of frames I2, B0, B1, P5, B3 and B4 are each n fields, the encoding to decoding delay for the frame P8 is n-2 fields. Decoding pauses before frame B6 for one field time but encoding does not pause until frame P11. Thus, the encoding to decoding delay for frames B6 and B7 is n-1 fields, and so on. In short, the encoding to decoding delay over the sequence of pictures previously described for the delayed stall pipeline encoder is between n and n-2 fields. More generally stated, the variation in encoding to decoding delay is n to n- the maximum number of repeat fields in a sequence of M pictures (where M is the picture spacing between reference frames). If the encoder does not set the repeat field flag in two consecutively captured frames, the variation in delay will be between n and n-.left brkt-top.M/2.right brkt-top. fields. However, for an encoder that can produce an arbitrary repeat.sub.-- first.sub.-- field pattern, the variation will be between n and n-M fields.
Consider that encoded frame data is preferably transmitted as a frame-wise contiguous stream, irrespective of any encoding or decoding pauses. In the decoder buffer model, the decoder is envisioned as filling at a piece-wise constant bit rate (namely, the bit rate allocated to a respective portion of the encoded video signal). The decoding of a picture by the decoder is delayed from the encoding of the same picture by the above noted encoding-to-decoding delay time, which can vary depending on the detection of repeat fields and the encoding pausing policy of the encoder. However, prior to encoding a given picture, an encoder must be able to deduce (from its model of the decoder buffer) the fullness of the decoder buffer prior to decoding the same picture (in order to determine the bit budget for that picture). Therefore, the statistics computer 18 (FIG. 1) will allocate the bit rates r1, r2, . . . rk to the encoders 14-1 to 14-k, and the encoders 14-1 to 14-k will update their decoder buffer models with such allocated bit rates after a delay of d field times, where d is a non-negative real number. Relative to the encoder's model of the decoder buffer (which, in the absence of encoding and decoding pauses, is presumed in the conventional encoders to decode each picture n field times after the encoder encodes it), the encoder implements the bit rate after a delay of n+d field times. See M. Perkins & D. Arnstein, Statistical Multiplexing of Multiple MPEG-2 Video Programs in a Single Channel, SMPTE J., vol. 104, no. 9, p. 569-599, September, 1995. If an encoder behaves in such a manner but the actual encode to decode delay is not n, then the encoder's model of the decoder buffer will not be accurate.
To illustrate this, consider as an example a case where d=0 and the statistic computer 18 allocates a new bit rate R1 to an encoder 14-2 representing a bit rate at which the decoder buffer fills just after frame B4 is decoded (the bit rate previously having been R0) and then allocates a new bit rate R2 to the encoder 14-2 representing a bit rate at which the decoder buffer fills just after frame B6 is decoded. Assume that the encoder 14-2 is a delayed stall type of encoder (the behavior of which is illustrated in FIGS. 6 and 7). FIG. 18 is a timing chart illustrating the curve C1 of the fullness of the encoder's model of the decoder's buffer superimposed on the curve C2 actual fullness of the decoder's buffer. The first bit rate change is received at the encoder approximately n field times before frame B4 is decoded, i.e., approximately when frame B4 is encoded. As shown, the encoder correctly changes its model of the decoder buffer to use the bit rate R1 after frame B4 is removed from the decoder buffer. The second bit rate change is received four field times later, i.e., n field times before frame B6 is decoded. As noted above, the encoder delays encoding the frame P8 until four field times later as a result of two repeat field triggered pauses. Accordingly, the encoder changes the bit rate at which its model of the decoder buffer fills to R2 after picture P8 is removed. In contrast, the decoder decodes the frame P8 only two field times after the frame B6 is decoded. As such, the decoder changes its bit rate to R2 after the frame B6 is removed. The net effect is that the fullness of the encoder's model of the decoder buffer diverges from the actual decoder buffer fullness after frame P8 is removed from the decoder buffer.
Conventional encoders behave as depicted in one of the FIGS. 4-7, i.e., with variable encode to decode delay. As noted above, variations in encode to decode delay cause the encoder's model of the decoder buffer fullness to diverge from the actual buffer fullness. Left unchecked, this divergence will cause the decoder buffer to overflow or underflow. To keep the decoder buffer from underflowing, a conventional encoder will normally delay updating its model of the decoder buffer with each rate increase allocated by the statistics computer by an amount of time equal to at least the maximum possible variation in encode-to-decode delay. As can be appreciated, such an approach would have prevented an encoder from modeling the decoder buffer fullness higher than the actual buffer fullness in, for example, the illustration of FIG. 18. However, such an approach generally causes the encoder's model of the decoder's buffer to be less full than the actual decoder buffer fullness. For example, when a rate increase is allocated to the encoder and the encode to decode delay is not decreasing (i.e., the encode to decode delay is constant or is increasing), or when a rate decrease is allocated to the encoder and the encode to decode delay decreases, the encoder's model of the decoder's buffer will be less full than the actual fullness of the decoder buffer. This inaccuracy will lead the encoder to use fewer bits than possible--an underestimate of the decoder buffer fullness by x bits will cause x bits to be wasted. In a conventional encoder, decoder buffer underflows are avoided by monitoring the encoder buffer fullness (which in a sense mirrors the decoder buffer fullness) and by substituting transmission of null data instead of useful data (e.g., compressed picture data or header/control data) when the encoder buffer is too empty. (Null data is typically transmitted as null transport packets, which are discarded before entering the decoder's compressed video data buffer.) With these methods used by conventional encoders to insure buffer compliance with the variable bit rate (e.g., statistical multiplexing) situations, the encoder periodically encodes pictures using bit allocations that are calculated assuming a lower transmission bit rate than will actually be used, and a considerable fraction of the transmitted data will be null data. Because fewer bits are spent to represent the video signal, the quality of the video (after decoding) is reduced.
Moreover, a conventional encoder may model the real-time behavior of the decoder buffer fullness in part by measuring the fullness of an output buffer at the encoder which temporarily stores encoded pictures pending transmission. (This may even be done in a constant bit rate system, e.g., where statistical multiplexing is not used, because of the drift between the synchronization of the video picture timing and the channel transmission. That is, a decoder buffer model based solely on the number of bits used in each picture, the number of fields produced per second and the number of bits transmitted per second will be inaccurate considering that the synchronization of the occurrence of the fields is drifting relative to the channel slots in which bits are transmitted.) However, the encoder buffer fullness only provides an accurate mirror image of the decoder buffer fullness when the encoding to decoding delay is constant. Specifically, in the encoder buffer model, the bits of each encoded picture are presumed to be inserted into the encoder buffer instantly upon completion of encoding and are removed gradually over time at the allocated fraction of the transmission channel bit rate allocated to the encoded video signal at that moment in time. However, as noted above, the decoder buffer removes pictures at different times for decoding. As a result, the times at which the encoder inserts a picture into the decoder buffer do not necessarily correspond to a fixed delay preceding the times at which the decoder removes such pictures from the decoder buffer. To prevent decoder buffer underflow and overflow given this lack of precise correlation, such encoders further constrain the allocation of bits to each picture to ensure that the encoder's model of the decoder's buffer fullness never exceeds some threshold b.sub.hi or falls below some threshold b.sub.lo where the high threshold b.sub.hi is somewhat below the maximum decoder buffer fullness B.sub.max and the low threshold b.sub.lo is somewhat above 0. Such headroom reduces the encoder's flexibility to use bits in pictures. Specifically, the encoder must use too many bits for low complexity pictures if the fullness of the encoder's model of the decoder's buffer is close to b.sub.hi (because a risk of a decoder buffer overflow is presumed) and too few bits for high complexity pictures if the fullness of the encoder's model of the decoder's buffer is too close to b.sub.lo (because a risk of a decoder buffer underflow is presumed).
It is an object of the present invention to overcome these disadvantages.