H.264/AVC (Advanced Video Coding), which is also referred to as MPEG-4/AVC, is a recommendation by the International Telecommunication Union (ITU-T) related to the encoding/decoding of video data. H.264/AVC can be used in a wide range of video applications, such as video conferencing, video broadcasting, and/or video streaming services with better compression than H.262 (i.e., MPEG-2) and H.263 (i.e., MPEG-4).
One of the reasons that H.264 can be used for a wide range of applications is that the transport of video data is treated differently from the decoding of the data. In particular, H.264/AVC specifies a Video Coding Layer (VCL) and a Network Abstraction Layer (NAL). The VCL specifies how video is coded/decoded/displayed, whereas the NAL specifies how that data is transmitted, for example, over a network. In this way, applications can be written differently depending on the environment in which the application will ultimately operate.
Data handled at the NAL level can be classified into VCL and non-VCL NAL units. In particular, VCL NAL units contain video data that corresponds to the samples in the video, whereas the non-VCL NAL units contain data that can describe how the data included in the VCL NAL units is to be decoded and/or displayed.
According to H.264/AVC, the VCL NAL units are used to transmit a series of images, which each include macroblocks of coded video data. The macroblocks of data are organized into slices within each of the images. Therefore, each image in the series includes a number of slices which include the macroblocks of video data that, when decoded, correspond to pixels within the slice. Each slice is prefaced by a “slice header” that includes information associated with the video data in that slice.
The NAL units (VCL and non-VCL) can be read as a bit stream from a buffer and parsed (in the NAL) by a syntax parser to determine, for example, which units (such as VCL NAL units) are to be processed for display. Parsed VCL NAL units can be processed by the VCL using an entropy decoder, an inverse transformer, a predictor, and a de-blocking filter.
The syntax parser can also process non-VCL NAL units that include information associated with how the video data in the VCL NAL units is to be processed. For example, the syntax parser can also parse non-VCL NAL units that include environmental information indicating how frames of video are to be processed and/or displayed or how a sequence of frames is to be processed and/or displayed. For example, some of the non-VCL NAL units can include Sequence Parameter Sets (SPS) and Picture Parameter Sets (PPS). The PPS can include a PPS ID (that identifies the PPS), an SPS ID that identifies which series of frames that is associated with the PPS, a flag for selecting either context adaptive variable length coding (CAVLC) or context-based adaptive binary arithmetic coding (CABAC) entropy coding, parameters that define slice groups, and parameters for prediction, quantization, and de-blocking. The SPS includes parameters that can indicate how a series of frames of video is to be processed and/or displayed. For example, the SPS includes the SPS ID that identifies the SPS (which is used in the PPS as a reference to the SPS), an indication of the maximum number of frames in the series, an indication of the order of frames in the series, and the width and height of a decoded frame in the series.
In operation, the “slice headers” included with the VCL NAL units are parsed to determine which parameters (i.e., which PPS and SPS) are to be applied to the decoding and/or display of the video data included as the payload in the VCL NAL unit. Furthermore, the environmental information including slice headers and video data, are compressed using an Exp-Golomb code and, therefore, may need to be decoded prior to access.