This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
Video coding standards include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (also know as ISO/IEC MPEG-4 AVC). In addition, there are currently efforts underway with regards to the development of new video coding standards. One such standard under development is the scalable video coding (SVC) standard, which is expected to become the scalable extension to the H.264/AVC standard as Annex G (previously Annex F). (It should also be understood that SVC may ultimately appear in a different Annex of the final standard.) Another such effort involves the development of China video coding standards.
Annex G introduces a feature known as extended spatial scalability, which includes cases where the edge alignment of a base layer macroblock and an enhancement macroblock is not maintained. A spatial scaling ratio of 1 or 2 with aligned macroblock edges across different layers is considered a special case of spatial scalability.
As used herein, the term “enhancement layer” refers to a layer that is coded differentially compared to some lower quality reconstruction. The purpose of the enhancement layer is that, when added to the lower quality reconstruction, signal quality should improve, or be “enhanced.” Further, the term “base layer” applies to both a non-scalable base layer encoded using an existing video coding algorithm, and to a reconstructed enhancement layer relative to which a subsequent enhancement layer is coded.
In SVC, a video sequence can be coded in multiple layers, and each layer is one representation of the video sequence at a certain spatial resolution or temporal resolution or at a certain quality level or some combination of the three. A portion of a scalable video bitstream can be extracted and decoded at a desired spatial resolution, temporal resolution, a certain quality level or some combination of these resolutions. A scalable video bitstream contains a non-scalable base layer and one or more enhancement layers.
In SVC coding, as the codec is based on a layer approach to enable spatial scalability, the encoder provides a down-sampling filter stage that generates the lower resolution signal for each spatial layer. In its basic version, the down-sampling ratio is equal to 2 (the base layer resolution is half the spatial resolution of its spatial enhancement layer). Extended Spatial Scalability (ESS) generalizes this concept by enabling the base layer to be a cropped version of its enhancement layer with a down-sampling ratio different from 1 or 2, thus enabling a generalized relation between successive spatial layers. A picture of a lower spatial layer may represent a cropped area of the higher resolution picture and the relation between successive spatial layers does not need to be dyadic. Geometrical parameters defining the cropping window and the down-sampling ratio can either be defined at the sequence level, or evolve at the picture level.
ESS may result in a situation wherein the edge alignment of a base layer macroblock (MB) and an enhancement layer MB is not maintained after the upsampling process. When spatial scaling is performed with a ratio of 1 or 2 and a MB edge is aligned across different layers, it is considered to be a special case of spatial scalability.
For example, when utilizing dyadic resolution scaling (i.e., scaling resolution by a power of 2), the edge alignment of MBs can be maintained. This example is illustrated in FIG. 1 where a half-resolution frame (i.e., base layer frame 100) is associated with an upsampled, full resolution version (i.e., enhancement layer frame 104). An MB 102, which comprises at least a portion of the base layer frame 100, is shown. The boundary of the MB 102 seen in the base layer frame 100 is “maintained” so that even after upsampling to give the enhancement layer 104, the boundary still, exactly encompasses four full-resolution MBs, i.e., MB 106, MB 108, MB 110, and MB 112. In other words, the edges of the four enhancement layer MBs 106, 108, 110, and 112 exactly correspond to the upsampled boundary of the MB 102. Importantly, the only base layer MB covering each of the enhancement layer MBs, i.e., MB 106, MB 108, MB 110, and MB 112, is MB 102. Therefore, no other base layer MB is necessary to form the prediction for MB 106, MB 108, MB 110, and MB 112.
FIG. 2 illustrates a non-dyadic scalability scenario where the scaling factor is 1.5. In this scenario, base layer MBs, i.e., MB 202 and MB 204 of base layer 200 will be upsampled from 16×16 to 24×24 shown in higher resolution enhancement layer 206. Enhancement layer MB 208 (outlined by dotted lines), is shown to be covered by two different upsampled MBs, i.e. MB 202 and MB 204. Thus, the two base layer MBs, i.e., MB 202 and MB 204, are required in order to form a prediction for the enhancement layer MB 208. It should be noted that depending upon the scaling factor, a single enhancement layer MB may be covered by up to four base layer MBs. In the current draft of Annex G of the H.264/AVC standard, an enhancement layer macroblock can be coded relative to the base layer, even though several base layer macroblocks may be needed to form the prediction.
In the current draft of Annex G, it is possible that an enhancement layer MB is coded relative to an associated base layer frame, even though several base layer MBs may be needed to form the prediction. Because coding efficiency is closely related to prediction accuracy, it is desirable to form an accurate prediction of the enhancement layer MB to improve coding efficiency