Subject matter related to the present application can be found in co-pending U.S. patent application Ser. No. 13/528,010, entitled “Scalable Coding Video Using Multiple Coding Technologies”; co-pending U.S. patent application Ser. No. 13/529,159, entitled “Scalable Video Coding Techniques”; and Ser. No. 13/414,075, entitled “Dependency Parameter Set for Scalable Video Coding”, all of which are incorporated herein by reference in their entireties.
Video compression using scalable techniques can allow a digital video signal to be represented in the form of multiple layers. Scalable video coding techniques have been standardized, including, temporal, spatial, and quality (SNR) scalability. Spatial and SNR scalability can be closely related in the sense that SNR scalability, at least in some implementations and for some video compression schemes and standards, can be viewed as spatial scalability with an spatial scaling factor of 1 in both X and Y dimensions, whereas spatial scalability can enhance the picture size of a base layer to a larger format by, for example, factors of 1.5 to 2.0 in each dimension. Due to this close relation, described henceforth is only spatial scalability.
ITU-T Rec. H.264 version 2 (2005) and later (available from International Telecommunication Union (ITU), Place des Nations, 1211 Geneva 20, Switzerland, and incorporated herein by reference in its entirety), and their respective ISO-IEC counterpart ISO/IEC 14496 Part 10 includes scalability mechanisms known as Scalable Video Coding or SVC, in their Annex G. All or substantially all features of temporal scalability are supported by various versions of H.264, whereas spatial or SNR scalability are specified in the SVC extension.
High Efficiency Video Coding (HEVC), specified in ITU-T Rec. H.265, available from the ITU) and incorporated herein by reference in its entirety, in its first version, also includes support for temporal scalability, whereas it lacks support for spatial or SNR scalability.
The specifications of spatial scalability in all aforementioned standards can vary, for example, due to different terminology, different coding tools of the non-scalable specification basis and/or different tools used for implementing scalability. However, one exemplary implementation strategy for a scalable encoder, configured to encode a base layer and one spatial or SNR enhancement layer, includes two encoding loops: one for the base layer, the other for the enhancement layer. Additional enhancement layers can be added by adding more coding loops. Conversely, a scalable decoder can be implemented by a base decoder and one or more enhancement decoder(s). This has been discussed, for example, in Dugad, R, and Ahuja, N, “A Scheme for Spatial Scalability Using Nonscalable Encoders”, IEEE CSVT, Vol 13 No. 10, Oct. 2003, which is incorporated by reference herein in its entirety.
FIG. 1 illustrates a block diagram of such a prior art scalable encoder. It includes a video signal input (101), a downsample unit (102), a base layer coding loop (103), a base layer reference picture buffer (104), which can be part of the base layer coding loop but can also serve as an input to a reference picture upsample unit (105), an enhancement layer coding loop (106), and a bitstream generator (107).
The video signal input (101) can receive the to-be-coded video in any suitable digital format, for example according to ITU-R Rec. BT.601, March 1982 (available from International Telecommunication Union (ITU), Place des Nations, 1211 Geneva 20, Switzerland, and incorporated herein by reference in its entirety). The term “receive” can involve pre-processing actions such as filtering, resampling to, for example, the intended enhancement layer spatial resolution, and other operations. The spatial picture size of the input signal can be assumed to be the same as the spatial picture size of the enhancement layer. The input signal can be used in unmodified form (108) in the enhancement layer coding loop (106), which is coupled to the video signal input.
The video signal input can also be coupled to a downsample unit (102). A purpose of the downsample unit (102) is to down-sample the pictures received by the video signal input (101) in enhancement layer resolution, to a base layer resolution. The downsample factor can be, for example, 1.0, in which case the spatial dimensions of the base layer pictures are the same as the spatial dimensions of the enhancement layer pictures, resulting in a quality scalability, also known as SNR scalability. In this case, the operation of the downsample unit (102) can be a forwarding of the samples without modification. Downsample factors larger than 1.0 lead to base layer spatial resolutions lower than the enhancement layer resolution, which enables spatial scalability. Various downsample filters useful for different downsample factors are known to those skilled in the art.
Video coding standards as well as application constraints can set constraints for the base layer resolution in relation to the enhancement layer resolution. The scalable baseline profile of H.264/SVC, for example, allows downsample ratios of 1.5 or 2.0 in both X and Y dimensions. A downsample ratio of 2.0 means that the downsampled picture includes only one quarter of the samples of the non-downsampled picture. In the aforementioned video coding standards, the details of the downsampling mechanism can be chosen freely, independently of the upsampling mechanism. In contrast, the aforementioned video coding standards can specify the filter used for up-sampling, so to avoid drift in the enhancement layer coding loop (106).
The output of the downsampling unit (102) can be a downsampled version of the picture as produced by the video signal input (109). The base layer coding loop (103) can take the downsampled picture (109) produced by the downsample unit (102), and encode it into a base layer bitstream (110).
Certain video compression technologies rely, among others, on inter picture prediction techniques to achieve high compression efficiency. Inter picture prediction allows for the use of information related to one or more previously decoded or otherwise processed pictures, known as reference pictures, in the decoding of the current picture. Examples for inter picture prediction mechanisms include motion compensation, where during reconstruction blocks of pixels from a previously decoded picture are copied or otherwise employed after being moved according to a motion vector, or residual coding, where, instead of decoding pixel values, the potentially quantized difference between a pixel (including in some cases motion compensated pixel) of a reference picture and the reconstructed pixel value is contained in the bitstream and used for reconstruction. Inter picture prediction is a technology that can enable coding efficiency in modem video coding.
Conversely, an encoder can also create reference picture(s) in its coding loop. While in non-scalable coding, the use of reference pictures can have relevance in inter picture prediction, in case of scalable coding, reference pictures can also be relevant for cross-layer prediction. Cross-layer prediction can involve the use of a base layer's reconstructed picture, as well as other base layer reference picture(s) as a reference picture in the prediction of an enhancement layer picture. This reconstructed picture or reference picture can be the same as the reference picture(s) used for inter picture prediction. However, the generation of such a base layer reference picture can be required even if the base layer is coded in a manner, such as intra picture only coding, that would, without the use of scalable coding, not require a reference picture.
While base layer reference pictures can be used in the enhancement layer coding loop, FIG. 1 depicts the use of the reconstructed picture (i.e., the most recent reference picture) (111) for use by the enhancement layer coding loop. The base layer coding loop (103) can generate reference picture(s) in the aforementioned sense, and store it in the reference picture buffer (104).
The picture(s) stored in the reconstructed picture buffer (111) can be upsampled by the upsample unit (105) into the resolution used by the enhancement layer coding loop (106). The enhancement layer coding loop (106) can use the upsampled base layer reference picture as produced by the upsample unit (105) in conjunction with the input picture coming from the video input (101), and reference pictures (112) created by the enhancement layer coding loop in its coding process. The nature of these uses depends on the video coding standard, and has already been briefly introduced for some video compression standards above. The enhancement layer coding loop (106) can create an enhancement layer bitstream (113), which can be processed together with the base layer bitstream (110) and control information (not shown in FIG. 1) so as to create a scalable bitstream (114).
U.S. patent application Ser. No. 13/529,159, entitled “Scalable Video Coding Technique”, and incorporated herein by reference in its entirety, discloses scalable video coding techniques suitable for HEVC and other basing video coding technologies, including multi-standard video coding. Multi-standard video coding, as outlined, for example in U.S. patent application Ser. No. 13/528,010 can refer to mechanisms that allow a base layer bitstream (110) to be of a different coding technique than the enhancement layer(s) bitstream(s) (113). As an example, throughout this specification, it is assumed that the base layer bitstream (110) (including its temporal enhancement layers, if any) conforms to H.264, whereas the at least one enhancement layer(s) bitstreams (113) including their respective temporal enhancement layers conform to a future extension of HEVC that can be based on techniques disclosed in aforementioned patent applications.
Accordingly, the base layer coding loop (103) that creates an H.264 compliant bitstream (110) uses a coding technology different than the enhancement layer coding loop (106), which creates HEVC scalable extension compliant enhancement layer bitstream(s) (113).