Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, so-called “smart phones,” video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video compression techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), the High Efficiency Video Coding (HEVC) standard presently under development, and extensions of such standards. The video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video compression techniques.
Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (i.e., a video frame or a portion of a video frame) may be partitioned into video blocks. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. Pictures may be referred to as frames, and reference pictures may be referred to as reference frames.
Spatial or temporal prediction results in a predictive block for a block to be coded. Residual data represents pixel differences between the original block to be coded and the predictive block. An inter-coded block is encoded according to a motion vector that points to a block of reference samples forming the predictive block, and the residual data indicates the difference between the coded block and the predictive block. An intra-coded block is encoded according to an intra-coding mode and the residual data. For further compression, the residual data may be transformed from the pixel domain to a transform domain, resulting in residual coefficients, which then may be quantized. The quantized coefficients, initially arranged in a two-dimensional array, may be scanned in order to produce a one-dimensional vector of coefficients, and entropy coding may be applied to achieve even more compression.
HEVC provides for parameter sets, such as Video Parameter Sets (VPSs), Sequence Parameter Sets (SPSs), and Picture Parameter Sets (PPSs). Such parameter sets include parameters that are applicable to one or more encoded pictures. For instance, parameters in an SPS may be applicable to an entire sequence of encoded pictures. A video decoder may need to be able to access the parameter sets applicable to an encoded picture to decode the encoded picture. In an HEVC bitstream, parameter sets are contained in Network Abstraction Layer (NAL) units separate from NAL units contained encoded slice segments of encoded pictures. Thus, the NAL units containing encoded slice segments of an encoded picture may be in a separate part of the bitstream from the NAL units containing the parameter sets needed for decoding the encoded picture.
In the context of video coding, random access refers to a decoding of a bitstream starting from a coded picture that is not the first coded picture in the bitstream. An HEVC bitstream includes Intra Random Access Pictures (IRAP) pictures to facilitate random access. Like other types of pictures, a video decoder may need to access the parameter sets applicable to IRAP pictures to decode the IRAP pictures.
A file format for storage of multi-layer HEVC bitstreams (i.e., L-HEVC bitstreams) is under development. In the file format, each track of the file may include a series of samples. Each sample of a track may include temporally collocated encoded pictures of one or more different layers. A device that stores a file containing an L-HEVC bitstream may extract all of the L-HEVC bitstream or portions of the L-HEVC bitstream and provided the extracted data, directly or indirectly, to a video decoder. To facilitate random access, the device may extract portions of the L-HEVC bitstream starting from a sample of the file containing an IRAP picture. Thus, while the device may be able to provide encoded video data of the IRAP picture to the video decoder, if the device is unable to provide the parameter sets needed for decoding the IRAP picture to the video decoder, the video decoder may be unable to decode the IRAP picture. Previous proposals for the file format do not adequately enable the device to ensure the parameter sets needed for decoding an IRAP picture are provided to the video decoder, especially in instances where the device is not configured to parse the L-HEVC bitstream itself. Adding the ability to parse the L-HEVC bitstream may add significant complexity to the device and slow operation of the device. Furthermore, such previous proposals may lead to unnecessarily large file sizes.