Established video coding standards include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 or ISO/IEC MPEG-4 AVC. H.264/AVC is the work output of a Joint Video Team (JVT) of ITU-T Video Coding Experts Group (VCEG) and ISO/IEC MPEG.
In addition, there are efforts working towards new video coding standards. One is the development of scalable video coding (SVC) standard in MPEG. The second effort is the development of China video coding standards organized by the China Audio Visual coding Standard Work Group (AVS). AVS finalized its first video coding specification, AVS 1.0 targeted for SDTV and HDTV applications, in February 2004. Since then the focus has moved to mobile video services. The resulting two standards AVS-M Stage 1 and AVS-M Stage 2 are scheduled to be published in December 2004 and April 2006, respectively.
Earlier video coding standards than H.264/AVC have specified a structure for an elementary bitstream, i.e., a self-containing bitstream that decoders can parse. The bitstream consists of several layers, typically including several of the following: a sequence layer, a picture layer, a slice layer, a macroblock layer, and a block layer. The bitstream for each layer typically consists of a header and associated data. Each header of a slice or higher layer starts with a start code for resynchronization and identification. This structure, which comprises a plurality of routines and sub-routines, is called the start code based bitstream structure.
The start code based bitstream structure can be depicted in a number of tables as follows (for simplicity, user data and extension data of sequence-level and picture-level are not included):
video_bitstream( ) {  next_start_code( )  do {    sequence_header( )    do {      picture_header( )      do {        slice_header( )        slice_data( )        next_start_code( )      } while( the following is a slice start code )    } while( the following is a picture start code )  } while( the following is not a bitstream end code )}sequence_header( ) {  sequence_start_code  sequence_header_parameter#1  sequence_header_parameter#2  ...  next_start_code( )}picture_header( ) {  picture_start_code  picture_header_parameter#1  picture_header_parameter#2  ...  next_start_code( )}slice_header( ) {  slice_start_code  slice_header_parameter#1  slice_header_parameter#2  ...}
As can be seen in the above tables, the video_bitstream ( ) routine contains a plurality of sub-routines such as next_start_code ( ), sequence_header ( ). The table for each of such sub-routines contains a plurality of codes, such as start code and a number of parameters. The next_start_code ( ) sub-routine in video_bitstream ( ) routine advances the bitstream pointer until the next start code. The sequence end code (not shown) is also a type of start code. The slice_data ( ) sub-routine (not shown as a table) contains the coded video data of a slice except the slice header.
The syntax for H.264/AVC consists of Network Abstraction Layer (NAL) units. The coded video data is organized into NAL units. Each of the NAL units is effectively a packet that contains an integer number of bytes. The first byte of each NAL unit is a header byte that contains an indication of the type of data in the NAL unit, and the remaining bytes contain payload data of the type indicated by the header. The NAL unit structure definition specifies a generic format for use in both packet-oriented and bitstream-oriented transport systems. A series of NAL units generated by an encoder is referred to as a NAL unit stream. A stream of NAL units does not form an elementary bitstream as such, because there are no start codes in NAL units. Rather, when an elementary bitstream structure is required, NAL units have to be framed with start codes according to Annex B of the H.264/AVC specification to form an elementary bitstream.
H.264/AVC contains headers at slice layer and below, but it does not include picture and sequence headers. Instead, headers are replaced by one or more parameter sets. The parameter set design is used to provide for robust and efficient conveyance of header information. As the loss of a few key bits of header information (such as sequence header or picture header information) could have a severe negative impact on the decoding process, this key information could be separated for handling in a more flexible and specialized manner by using the parameter set design.
A parameter set is supposed to contain information that is expected to change rarely and offers the decoding of a large number of slices. There are two types of such parameter sets:                1) sequence parameter sets, which apply to a series of consecutive coded video pictures called a coded video sequence; and        2) picture parameter sets, which apply to the decoding of one or more individual pictures within a coded video sequence.        
The sequence and picture parameter-set mechanism decouples the transmission of infrequently changing information from the transmission of coded representations of the values of the samples in the video pictures. Each slice contains an identifier that refers to the content of the relevant picture parameter set and each picture parameter set contains an identifier that refers to the content of the relevant sequence parameter set. In this manner, a small amount of data (the identifier) can be used to refer to a larger amount of information (the parameter set) without repeating that information within each slice. Sequence and picture parameter sets can be sent well ahead of other NAL units that they apply to, and can be repeated to provide robustness against data loss. In some applications, parameter sets may be sent within the channel that carries other NAL units (termed “in-band” transmission). In other applications, it can be advantageous to convey the parameter sets “out-of-band” using a more reliable transport mechanism than the video channel itself.
The bitstream structure of H.264/AVC is called the NAL unit plus parameter set bitstream structure. Note that if H.264/AVC Annex B is used, then the bitstream structure can be considered as a start code plus parameter set bitstream structure, because the concatenation of the start code prefix of H.264/AVC Annex B and the first byte of NAL unit can be defined as a start code.
The NAL unit plus parameter set bitstream structure is a concatenation of a number of NAL units, including the sequence parameter set NAL unit, picture parameter set NAL unit and slice NAL unit, as shown below:
sequence_parameter_set_NAL_unit( ) {  nal_unit_header  sequence_parameter_set_id  sequence_parameter#1  sequence_parameter#2  ...}picture_parameter_set_NAL_unit( ) {  nal_unit_header  picture_parameter_set_id  sequence_parameter_set_id  picture_parameter#1  picture_parameter#2  ...}slice_NAL_unit( ) {  nal_unit_header  slice_header( )  slice_data( )}slice_header( ) {  picture_parameter_set_id  slice_header_parameter#1  slice_header_parameter#2  ...}In the above tables, the nal_unit_header code indicates the type of a NAL unit, among other things.
The start code plus parameter set bitstream structure can be depicted as follows:
video_bitstream( ) {  next_start_code( )  do {    if( the following is a sequence parameter set startcode ) {      sequence_parameter_set( )    }    if( the following is a picture parameter set startcode ) {      picture_parameter_set( )    }    if( the following is a slice start code ) {      slice_header( )      slice_data( )      next_start_code( )    }  } while( the following is not a bitstream end code )}sequence_parameter_set( ) {  sequence_parameter_set_start_code  sequence_parameter_set_id  sequence_parameter#1  sequence_parameter#2  ...  next_start_code( )}picture_parameter_set( ) {  picture_parameter_set_start_code  picture_parameter_set_id  sequence_parameter_set_id  picture_parameter#1  picture_parameter#2  ...  next_start_code( )}slice_header( ) {  slice_start_code  picture_parameter_set_id  slice_header_parameter#1  slice_header_parameter#2  ...}
In the above tables, the sequence_parameter_set_id code identifies a sequence parameter set from any other sequence parameter set. The picture_parameter_set_id code identifies a picture parameter set from any other picture parameter set.
Compared to the start code based structure, the sequence header and picture header sub-routines are useless in the start code plus parameter set structure. For this reason, the sequence header and picture header sub-routines are excluded from the start code plus parameter set structure. AVS Video 1.0 has adopted the start code based bitstream structure. It is so far not clear whether the start code based bitstream structure or the structure with NAL unit plus parameter set will be used for AVS-M and MPEG-21 SVC coding standards.
In the start code based bitstream structure, such as the bitstream structures in coding standards earlier than H.264/AVC, the parameter set technique is not used. Thus, infrequently changing information that remains unchanged has to be repeatedly signaled for each sequence in the sequence header or each picture in the picture header. This is wasteful from compression efficiency point of view. Further, without using the parameter set technique, transmission of infrequently changing information is difficult to be decoupled from transmission of other information. This makes the coded data more vulnerable to transmission errors, as the loss of a few key bits of infrequently changing information in the sequence or picture header could have a severe negative impact on the decoding process.
In the NAL unit plus parameter set bitstream structure and the start code plus parameter set bitsteam structure, there are no picture headers. Some information that remains unchanged for a picture has to be repeated in each slice header. This is also wasteful from compression efficiency point of view. Particularly, for H.264/AVC, as can be seen below, such information can take about 2% of the total bit rate in a conservative estimate.
The conventional parameter set based structure in a layer hierarchy (whether plus NAL unit or start code) is shown in FIG. 1.
The parameters in H.264/AVC slice header include those can change from slice to slice throughout the picture as well as those remain unchanged throughout the picture. FIG. 2 shows the parameters in the slice header that do not change throughout the picture with an estimate of how many bits each parameter uses. The estimation gives the result of 16 bits per slice. For a CIF (Common Intermediate Format) picture, with a slicing method of one macroblock row per slice, there are 18 rows per frame. That gives 18×16=288 bits/frame. At 30 frames per second, this becomes 8640 bits/sec, which is 2.3% of 384 kbps total bit rate. For mobile video telephony, it is reasonable to assume that a QCIF (Quarter CIF) picture has 100 bytes per slice to be conveyed at 64 kbps. This is equivalent to 80 slices/sec. With 16 bits per slice, the transmission rate is 80×16=1280 bits/sec or 2.0% of the total bit rate at 384 kbps.