In video compression, rate control refers to technologies that tune bit stream parameters, most commonly the Quantization Parameter (QP), according to a known bit budget. Rate control schemes are known that adjust the QP (and/or other bit stream parameters) to units, for example, individual macroblocks, slices, individual pictures, or group of pictures (GOPs). Many papers have been published on rate control concepts optimized to address the tuning of the QP in one or more of the aforementioned units.
In order to successfully apply a rate control mechanism, the mechanism needs to know the target number of bits—a bit allocation—for the unit it is designed to operate on (e.g., the macroblock, slice or picture). Spatial enhancement layers, Signal-to-Noise Ratio (SNR) enhancement layers, or enhancement layers of other types can also be present in the bitstream, and, in some configurations, a spatial/SNR/other type enhancement layer can be used as a temporal base layer. Therefore, pictures in a temporal base layer can refer to pictures in spatial, SNR, or other types of non-temporal base or enhancement layer pictures for prediction. A layered bitstream can also have one specific base layer that is distinguished by its pictures having a prediction relationship only to other pictures in this layer, and not to any pictures in any of the enhancement layers. This layer is, henceforth, referred to as the “fundamental base layer”. Further, the temporal base layer is henceforth simply referred to as the “base layer”.
Temporal scalability has been known for some time—at least since 1992—and relates to the use of one or more temporal enhancement layers that enhance the frame rate, after decoding, of a base layer.
FIG. 1 depicts a prior art example. Pictures (102) and (103) are part of the base layer (101), denoted as TL0. The base layer (101) is independently decodable and requires that all the coded pictures of the base layer have dependencies only to each other (be it through forward, backward, bi- or multi-picture prediction) (104), (105), and not to pictures in the enhancement layers. With continuing reference to FIG. 1, the frame rate of the base layer is 7.5 Hz; therefore, the interval between two adjacent pictures of TL0 is approximately 133 ms. A first temporal enhancement layer (106), denoted as TL2, contains pictures (107) and (108). These pictures may be predicted from the base layer pictures (109), (110), as well as from other pictures of TL2 (111). Therefore, to successfully decode TL2, TL0 and TL2 pictures need to be available. As the TL2 pictures are sampled approximately 66 ms later than the pictures of TL0, the frame rate after decoding TL0 and TL2 in combination is 15 Hz. Decoding TL0 and TL2 in combination results in a visually more pleasing experience due to the higher frame rate, but also requires encoding, transmission, and decoding of both TL0 and TL2, requiring more computational and network bandwidth resources. A second temporal enhancement layer, TL3 (112), includes pictures (113), (114), (115), and (116). Pictures of the second temporal enhancement layer may be dependent on both TL0 (101) and TL2 (106) as well as other pictures of TL3, and, therefore, both TL0 and TL2 pictures may be required to successfully decode TL3. For clarity, the TL3 dependency relationship is not shown in FIG. 1. The frame rate, after decoding, of TL0, TL2, and TL3 is 30 Hz, with a picture interval of approximately 33 ms.
In many modern video compression standards, the GOP concept is similar, but often, the definition of an anchor picture is somewhat softened. Still referring to FIG. 1, a GOP refers to a first anchor picture and all pictures in temporal order up to, but excluding, the next anchor picture. In this disclosure, an anchor picture is defined as any picture in TL0 (101); in other words, any base layer picture. One GOP includes pictures (102) belonging to TL0 and serving as the first anchor picture, pictures (107) belonging to TL2, and pictures (113), (114), both belonging to TL3.
Temporal scalability can be practiced using ITU-T Recommendation H.264 baseline profile (among many other profiles including Annex G). ITU-T Recommendation H.264 is informally known as Advanced Video Coding (AVC), and its scalable extension (Annex G) is informally known as Scalable Video Coding (SVC). Both are available in the same standards document known to those skilled in the art, which is available, e.g., from http://www.itu.int/rec/T-REC-H.264-200903-I or from the International Telecommunication Union, Place des Nations, 1211 Geneva 20, Switzerland. Many other standardized or non-standardized forms of temporal scalability are also known.
Many publications related to the bit allocation to individual (temporal scalable or other enhancement) layers by a layered encoder and in a layered bitstream stop short of disclosing techniques to determine the bit allocation, often by claiming that the rate is determined by external factors, such as available network bandwidth.
Common, for example, are explanations along the following exemplary lines. The base layer, e.g., TL0, can be optimized for consumption by a mobile device with an access link speed of 64 kbit/s. A first temporal enhancement layer, e.g., TL2, can be optimized for the user over two B-channel ISDN, with a bandwidth of 128 kbit/s. As TL0 is already requiring 64 kbit/s, TL2 has a budget of 128 kbit/s−64 kbit/s=64 kbit/s. A second temporal enhancement layer, e.g., TL3, can be optimized for a fractional T1 connection with 384 kbit/s total connectivity, resulting in 256 kbit/s for TL3 (following the above rationale).
The concept of a Group of Pictures, GOP, was introduced before 1992. In the MPEG standards arena, a GOP refers to an anchor picture and all the pictures up to the next anchor picture Anchor pictures were traditionally intra coded pictures, also known as I pictures. In most modern standards, the GOP concept is kept, but, often, the definition of an anchor picture is somewhat softened. In this disclosure, a GOP refers to a first anchor picture and all pictures in temporal order up to the next anchor picture. In this disclosure, an anchor picture is defined as any picture in TL0; in other words, any base layer picture. Still referring to FIG. 1, one GOP consists of pictures (102) belonging to TL0 and serving as anchor picture, (107) belonging to TL2, and (113), (114), both belonging to TL3.
Implementing a video encoder, regardless of whether it uses a non-scalable or scalable approach, may be realized, for example, using a software implementation on a sufficiently powerful general purpose processor, dedicated hardware circuitry, a Digital Signal Processor (DSP), or any combination thereof.