H.264 (Moving Picture Experts Group-4 (MPEG-4) Advanced Video Coding (AVC)) is the state of the art video coding standard. It consists of a block based hybrid video coding scheme that exploits temporal and spatial prediction. High Efficiency Video Coding (HEVC) is a new video coding standard currently being developed in Joint Collaborative Team-Video Coding (JCT-VC). JCT-VC is a collaborative project between MPEG and International Telecommunication Union Telecommunication standardization sector (ITU-T). Currently, an HEVC Working Draft (WD) is defined that includes a number of new tools and is considerably more efficient than H.264/AVC. HEVC also defines a temporal_id for each picture, corresponding to the temporal layer the picture belongs to. temporal_id is also present in Scalable Video Coding (SVC), the scalability extension of H.264/AVC.
An HEVC bitstream consists of Network Abstraction Layer (NAL) units which are grouped together in access units. Each access unit contains a picture associated with a decoding order value and an output order value.
It is said that a bitstream conforms to a standard if it fulfills the requirements for bitstream conformance. For HEVC the bitstream conformance requirements can be summarized as:
“It is a requirement of bitstream conformance that the bitstream shall be constructed according to the syntax, semantics, and constraints specified in this Specification outside of this annex. It is a requirement of bitstream conformance that the first coded picture in a bitstream shall be a Random Access Point (RAP) picture, i.e. an Instantaneous Decoder Refresh (IDR) picture or a Clean Random Access (CRA) picture or a Broken Link Access (BLA) picture.” “For conforming bitstreams, all of the following conditions shall be fulfilled.” HEVC lists ten conditions that must be fulfilled by a conforming bitstream. These ten conditions are listed in the attached Annex A.
Correspondingly, for H.264/AVC the bitstream conformance requirement can be summarized as:
1. “The bitstream is constructed according to the syntax, semantics, and constrains specified in this Recommendation|International standard.”
2. “For conforming bitstreams, all of the following conditions shall be fulfilled . . . ” H.264/AVC lists seven such conditions which are found in Annex B.
The temporal layers are ordered and have the property that a lower temporal layer never depends on a higher temporal layer. Thus, higher temporal layers can be removed without affecting the lower temporal layers. The removal of temporal layers can be referred to as temporal scaling.
In SVC a sub-bitstream extraction process is defined specifying that a conforming bitstream from which all NAL units with temporal_id higher than a defined value are removed shall also be a conformant bitstream.
The reason to extract a sub-bitstream can, for example, be to adapt the bitrate of a relayed bitstream in a network node based on changes in network conditions.
The existing sub-bitstream extraction processes, e.g. in SVC, have two major problems:
The sub-bitstream extraction process only covers the case when an entire sequence is scaled to a specific layer.
The sub-bitstream extraction process does not give any guarantees to an encoder or a network node regarding scalability. On the contrary, it puts requirements on the bitstream that the encoder and network node must fulfill in order to produce a conforming bitstream. Thus, the encoder must check each sub-bitstream for bitstream conformance. A network node that extracts a sub-bitstream must check the resulting bitstream for bitstream conformance. This makes flexible encoders and network nodes very complex.