Scalable video coding refers to coding structure where one bitstream can contain multiple representations of the content at different bitrates, resolutions or frame rates. In these cases the receiver can extract the desired representation depending on its characteristics. Alternatively, a server or a network element can extract the portions of the bitstream to be transmitted to the receiver depending on e.g. the network characteristics or processing capabilities of the receiver. A scalable bitstream typically consists of a base layer providing the lowest quality video available and one or more enhancement layers that enhance the video quality when received and decoded together with the lower layers. In order to improve coding efficiency for the enhancement layers, the coded representation of that layer typically depends on the lower layers.
A coding standard or system may refer to a term operation point or alike, which may indicate the scalable layers and/or sub-layers under which the decoding operates and/or may be associated with a sub-bitstream that includes the scalable layers and/or sub-layers being decoded.
In SHVC (Scalable extension to H.265/HEVC) and MV-HEVC (Multiview extension to H.265/HEVC), an operation point definition may include a consideration a target output layer set. In SHVC and MV-HEVC, an operation point may be defined as a bitstream that is created from another bitstream by operation of the sub-bitstream extraction process with the another bitstream, a target highest temporal level, and a target layer identifier list as inputs, and that is associated with a set of target output layers.
However, the scalability designs in the contemporary state of various video coding standards have some limitations. For example, in SHVC, pictures of an access unit are required to have the same temporal level. This disables encoders to determine prediction hierarchies differently across layers, thus limiting the possibilities to use frequent sub-layer up-switch points and/or to achieve a better rate-distortion performance. Moreover, a further limitation is that temporal level switch pictures are not allowed at the lowest temporal level. This disables to indicate an access picture or access point to a layer that enables decoding of some temporal levels (but not necessarily all of them).