Advanced Video Coding (AVC) is a widely deployed video coding standard that was developed jointly by MPEG and ITU-T and was finalized in 2003. High Efficiency Video Coding is a recent video coding standard developed in Joint Collaborative Team—Video Coding (JCT-VC), a collaborative project between MPEG and ITU-T. Version 1 of HEVC was finalized in January 2013 and currently several extensions to the standard are being developed. One of them is a scalable extension (SHVC) that allows for a single encoded bitstream to contain different versions of the same video with different resolutions and/or quality, also denoted version 2 of HEVC. Prediction between the layers is allowed in order to improve coding efficiency compared to sending the different versions of the video as independent streams. A special case of the scalable extension to HEVC is the hybrid codec scalability where the lowest layer, i.e. the base layer, is encoded with AVC and the higher layer(s), i.e. the enhancement layers, is(are) encoded with HEVC. This case is defined as the hybrid codec scalability functionality of SHVC.
Both AVC and HEVC define a Network Abstraction Layer (NAL). All the data, i.e. both video and non-video, of AVC and HEVC is encapsulated in NAL units. The NAL unit begins with a header, 1 byte in AVC and 2 bytes in HEVC, which among other things contains the NAL unit type that identifies what type of data is carried in the NAL unit. The NAL unit type is transmitted in the nal_unit_type codeword in the NAL unit header and the type indicates and defines how the NAL unit should be parsed and decoded. A bitstream consists of a series of concatenated NAL units.
The syntax for an AVC NAL unit is shown here below.
CDescriptornal_unit( NumBytesInNALunit ) {forbidden_zero_bitAllf(1)nal_ref_idcAllu(2)nal_unit_typeAllu(5)NumBytesInRBSP = 0for( i = nalUnitHeaderBytes; i <NumBytesInNALunit; i++ ) {if( i + 2 < NumBytesInNALunit &&next_bits( 24 ) = = 0x000003 ) {rbsp_byte[ NumBytesInRBSP++ ]Allb(8)rbsp_byte[ NumBytesInRBSP++ ]Allb(8)i += 2emulation_prevention_three_byte /*Allf(8)equal to 0x03 */} elserbsp_byte[ NumBytesInRBSP++ ]Allb(8)}}
The syntax for a HEVC NAL unit is shown here below
Descriptornal_unit( NumBytesInNalUnit ) {nal_unit_header( )NumBytesInRbsp = 0for( i = 2; i < NumBytesInNalUnit; i++ )if( i + 2 < NumBytesInNalUnit &&next_bits( 24 ) = = 0x000003 ) {rbsp_byte[ NumBytesInRbsp++ ]b(8)rbsp_byte[ NumBytesInRbsp++ ]b(8)i += 2emulation_prevention_three_byte /*f(8)equal to 0x03 */} elserbsp_byte[ NumBytesInRbsp++ ]b(8)}
The syntax of a HEVC NAL unit header is shown here below.
Descriptornal_unit_header( ) {forbidden_zero_bitf(1)nal_unit_typeu(6)nuh_layer_idu(6)nuh_temporal_id_plus1u(3)}
For single layer coding, an access unit (AU) is the coded representation of a picture, which may consist of several video coding layer (VCL) NAL units as well as non-VCL NAL units. A coded video sequence (CVS) is a series of access units starting at a random access point (RAP) access unit up to, but not including the next RAP access unit in decoding order. The decoding order is the order in which NAL units shall be decoded, which is the same as the order of the NAL units within the bitstream. The decoding order may be different from the output order, which is the order in which decoded pictures are to be output, such as for display, by the decoder.
Non-VCL NAL units are for example parameter sets. Both AVC and HEVC define picture parameter set (PPS) and sequence parameter set (SPS), which contain parameters valid for a picture or a sequence respectively. In HEVC there is another parameter set; video parameter set (VPS) that contains information valid for several layers. A new VPS can only be activated at the start of a new CVS.
The first byte of each NAL unit in AVC and HEVC contains the nal_unit_type syntax element. A decoder or bitstream parser can conclude how the NAL unit should be handled, e.g. parsed and decoded, after looking at the first byte. However if there are AVC NAL units in an HEVC stream the HEVC decoder or parser will interpret them incorrect since they will be decoded or parsed as HEVC NAL units unless some external identification method of NAL units is present. Similarly, if there are HEVC NAL units in an AVC stream the AVC decoder or parser will interpret them incorrect since they will be decoded or parsed as AVC NAL units. Hence, there is need for correctly handling hybrid codec scalable video bitstreams comprising both AVC NAL units and HEVC NAL units.
An approach [3] is to encapsulate the AVC NAL units in HEVC with an additional HEVC NAL unit header of a specific NAL unit type, e.g. nal_unit_type=ENC_NUT, where one of the HEVC nal_unit_type values that currently is reserved for future use is used for ENC_NUT. Whenever an HEVC parser sees the ENC_NUT NAL unit type, the additional HEVC NAL unit header is removed and the remaining data, including the original AVC NAL unit header, is sent to the AVC decoder.
FIG. 1 shows a bitstream with an AVC base layer picture and an enhancement layer HEVC picture. The HEVC parser will do an HEVC NAL unit type (NUT) parsing and first sees the NAL unit type ENC_NUT in the HEVC NAL unit (NALU) header of the AVC NAL unit. It will then know that the next byte and payload is the original AVC NAL unit that can be forwarded to the AVC decoder.
The main disadvantage with this approach is that legacy AVC decoders and AVC sub-bitstream extractors cannot handle the bitstream since it contains elements that are not backwards compatible. In order to extract the AVC base layer, the extractor must be instructed to process the first byte of each NAL unit according to the HEVC syntax and look for when the NAL unit type is equal to ENC_NUT. Thus, the currently proposed technology for handling hybrid codec scalable video bitstreams is not backwards compatible with existing legacy AVC decoders and sub-bitstream extractors.