1. Field of the Invention
The present invention relates generally to image and video signals. More particularly, the present invention relates to coding or compressing image and video signals.
2. Related Art
In video coding standards, such as MPEG-1, MPEG-2, MPEG-4, H.261, H.263 and the new H.264/MPEG-4 Part 10, a bitstream is determined to be conformant if the bitstream adheres to the syntactical and semantic rules embodied in the standard. One such set of rules takes the form of successful flow of the bitstream through a mathematical or hypothetical model of a decoder, which receives the bitstream from an encoder. Such a model decoder is referred to as the hypothetical reference decoder (“HRD”) in some standards or the video buffer verifier (“VBV”) in other standards. In other words, the HRD specifies rules that bitstreams generated by a video signal encoder must adhere to for such encoder to be considered conformant under a given standard. Stated differently, a HRD is a normative means according to which encoders must create bitstreams, which bitstreams adhere to certain rules and constraints, and real decoders can assume that such rules have been conformed with and such constraints are met.
The HRD serves to place constraints on the variations in bit rate over time in a compressed bitstream. HRD may also serve as a timing-and-buffering model for a real decoder implementation or for a multiplexor. Associated with the HRD are syntax elements defined in the standard, and algorithms embodied in software or hardware in various products such as encoders, multiplexors, conformance analyzers, and so on.
The HRD represents a means to communicate how the bit rate is controlled in the process of compression. The HRD may be designed for variable or constant bit rate operation, and for low-delay or delay-tolerant behavior. As shown in FIG. 1, HRD 100 includes pre-decoder buffer 110 (or VBV buffer) through which compressed bitstream 105 flows with a precisely specified arrival and removal timing. Compressed bitstream 105 contains a sequence of coded pictures 115 and associated ancillary messages, which flow into pre-decoder buffer 110 according to a specified arrival schedule. All compressed bits associated with a given coded picture 115 are removed from pre-decoder buffer 110 by instantaneous decoder 120 at the specified removal time of the picture. Pre-decoder buffer 110 overflows if the buffer becomes full and more bits are arriving. Pre-decoder buffer 110 underflows if the removal time for a picture occurs before all compressed bits representing the picture have arrived. Typically, HRDs differ in the means to specify the arrival schedule and removal times, and the rules regarding overflow and underflow of pre-decoder buffer 110.
HRDs in accordance with some existing standards such as H.263 and H.261 have been designed for low-delay operation. In short, such HRDs operate by removal of all bits associated with a picture the first time the buffer is examined, rather than at a time transmitted in the bitstream. Such HRDs do not specify when the bitstream arrives in the pre-decoder buffer. Therefore, such HRDs do not allow for precisely timed removal of bits from the pre-decoder buffer and create a difficulty for systems designed to display pictures with precise timing.
Other HRD standards, such as MPEG-2, can operate in variable bit rate or constant bit rate mode and also have a low-delay mode. The MPEG-2 HRD (known as the VBV) has two modes of operation based on whether a picture removal delay is transmitted in the bitstream or not. In the first mode of operation or mode A, when the removal delay is transmitted, the rate of arrival into the VBV buffer of each picture is computed based on picture sizes, the removal delay and additional removal time increments. Mode A can be used by the encoder to create both variable and constant bit rate streams. However, mode A suffers from a shortcoming that the entire bitstream must be scanned in order to make a determination as to whether a given bitstream has a constant bit rate. Mode A further suffers from an ambiguity at the beginning of the sequence that prevents the initial bit rate from being determined. Therefore, technically, mode A does not allow for a determination as to whether the bitstream is a constant bit rate (“CBR”) bitstream.
In its second mode of operation or mode B (which is also referred to as a leaky bucket), unlike mode A, the encoder does not transmit the removal delays. In mode B, the arrival rate is constant unless the pre-decoder buffer is full, under which condition no bits arrive. Thus, mode B, having a constant arrival rate, does not introduce an ambiguity regarding the initial arrival rate. However, mode B has an arrival schedule, which may not be constrained to model the real production of bits. This unconstrained aspect of mode B can result in very large delays through a real decoder and limits its use as guidance for real-time multiplexors. In mode B, compressed data arrives in the VBV buffer at the peak rate of the buffer until the buffer is full, at which point the data stops. The initial removal time is the exact point in time when the buffer becomes full. Subsequent removal times are delayed by fixed frame or field periods with respect to the first.
As a hypothetical example of the long delays the Mode B may introduce, consider an encoder that produces a long sequence of very small pictures (i.e. few bits are used to compress each picture) at the start of the sequence. For the purpose providing an example, consider that 1,000 small pictures that can all fit in the VBV buffer are produced. All 1,000 pictures would enter the buffer in a time less than the time-equivalent of the buffer size, which is typically less than one second. The last of such pictures would then remain in the buffer for 999 picture periods longer than the first picture, or roughly 30 seconds. This requires that the encoder create a delay of that same amount of time before transmitting the first picture. However, in real-time broadcast applications, it is not generally possible to insert a thirty-second delay at the encoder. Rather, an encoder can only transmit the bits associated with the small pictures after they have been produced. In terms of a VBV model, this would introduce a series of time intervals during which the VBV buffer is not full, but no bits are entering. Therefore, a real-time encoder cannot imitate the buffer arrival timing of mode B of VBV.
In both modes A and B, the removal times are based on a fixed frame rate. Neither of these MPEG-2 VBV modes can handle variable frame rate, except for the one special case of film content captured as video. In this special case, the removal time of certain pictures is delayed by one field period, based on the value of a bit field in the picture header of that picture or a previous picture.
As opposed to mode A of VBV in which the encoder must prevent both buffer overflow and under flow, in mode B, it is impossible for the buffer to overflow, as data stops entering when the buffer becomes full. However, in mode B, the encoder must still prevent buffer underflow.
The MPEG-2 VBV also includes a separately specified low-delay mode. In the low-delay mode, the pre-decoder buffer may underflow occasionally and there are precise rules, involving skipping pictures, which define how the VBV is to recover. Because of the number of modes of operation, and the arcane method of handling the one special case of variable frame rate, the MPEG-2 VBV is overly complex. It also suffers from the initial rate ambiguity of Mode A and the non-causality of Mode B.
A need exists for an improved hypothetical reference decoder that addresses the problems and deficiencies associated with the existing HRDs.