Transmission of encoded digital video data from one computer system over a communication medium to another computer system typically involves carrying the video bitstream as payload data within the payload of transport protocol packets. This process of inserting data into another protocol's packet is generally referred to as "encapsulation." While this process of encapsulation adds overhead, it provides a method of transmitting data from one location to another over an intermediate communication medium. The process of dividing the input bitstream among the transport protocol packets is referred to as "segmentation." Generally, encapsulation involves "segmentation" of the bitstream at the source into packets and reassembling the bitstream at the destination. For example, a prior encapsulation approach is described with reference to FIG. 2B and FIG. 3. Likewise, the video bitstream format is described with reference to FIG. 1.
FIG. 1 illustrates the layers used by H.263 to represent a picture in an encoded digital video bitstream. The Telecommunication Standardization Sector of the International Telecommunication Union (ITU-T) has specified a coded representation useful for compressing the moving picture portion of a low bitrate audio-visual service. This coded representation is described in Recommendation H.263 entitled "Video Coding For Low Bitrate Communication." Draft ITU-T Recommendation H.263, published 1995 (hereinafter H.263).
Only a brief description of the layers is presented here, as Recommendation H.263 fully describes the syntax for each layer. At the highest level, is a picture layer 110. Generally, an encoded picture includes a picture header 115, one or more Groups of Blocks in a Group of Blocks (GOB) layer, and an end of sequence (EOS) code. The picture header 115 includes, among other fields, a picture start code (PSC) field and a picture type (PTYPE) information field. These fields and their purpose are described in detail in Recommendation H.263.
The GOB layer 120 includes a GOB header 125 and a macroblock (MB) layer. The GOB header 125 includes optional stuffing bits, GSTUF, a GOB start code (GBSC), a GOB group number (GN), an optional GOB sub bitstream indicator (GSBI), a GOB frame identifier (GFID), and quantizer information (GQUANT).
The macroblock layer 130 includes a MB header 135 followed by block data in a block layer 140. At the lowest level, is the block layer 140. Each block includes an optional DC coefficient for INTRA blocks (INTRADC), and a variable length coded transform coefficient (TCOEF).
The five standardized picture formats are described below with reference to Table 1.
TABLE 1 ______________________________________ Number of Pixels per Line and Number of Lines for each of the H.263 Picture Formats Picture # Pixels # Lines # Pixels # Lines MBs per Format for Lum for Lum for Chro for Chro GOB ______________________________________ sub-QCIF 128 96 64 48 8 QCIF 176 144 88 72 11 CIF 352 288 176 144 22 4CIF 704 576 352 288 88 16CIF 1408 1152 704 576 352 ______________________________________
Table 1 shows the sampling structure for each of the five standardized picture formats. For example, a picture in Quarter Common Intermediate Format (QCIF) has 176.times.144 pixels for luminance and 88.times.72 pixels for chrominance. The last column of Table 1 indicates the number of macroblocks (MBs) per group of blocks (GOB) for each picture format. This number may be calculated using the pixel data in Table 1, the number of GOBs per picture, and the number of pixels in each MB. The number of GOBs per picture is defined by H.263 to be 6 for sub-QCIF, 9 for QCIF, and 18 for CIF, 4CIF and 16CIF. Each macroblock contains 16.times.16 pixels of luminance and 8.times.8 pixels of chrominance. To determine the number of MBs per GOB for a QCIF picture, the number of luminance pixels (176.times.144=25,344) is divided by the number of luminance pixels in each MB (16.times.16=256; 25,344/256=99). Subsequently, the number of MBs employed to encode the given picture format (99) is divided by the number of GOBs for the picture format (9) to determine the number of MBs per GOB for the particular picture format. For this example, the result is 11 MBs per GOB for QCIF pictures. From this example, it should be apparent that the spatial location of GOBs within the video bitstream, in this prior approach, is determined based upon the picture format. Thus, for each picture format, a GOB header can appear in the video bitstream only at certain predetermined spatial locations within the bitstream.
Referring now to FIG. 2A, the GOB layer for a QCIF picture is depicted. As described above, the GOB layer for this particular picture format includes 9 GOBs (210-290), each including 11 macroblocks. This figure illustrates the inflexibility of this prior approach. As discussed earlier, GOB headers are only allowed at predetermined spatial locations in the video bitstream for a given picture format. For example, when encoding a QCIF picture, a GOB header can only appear after a multiple of 11 consecutive MBs as shown.
FIG. 2B shows a H.263 video packet using a prior encapsulation approach and the prior layer structure shown in FIG. 1. The H.263 video packet 201 includes a Real-Time Transport Protocol (RTP) packet header 202 and a payload area 207. Further information regarding RTP may be found in H. Schulzrinne, S. Casner, R. Fredrick, V. Jacobson, "RTP: A Transport Protocol for Real-Time Application," RFC 1889, 1996. Using this prior approach, the payload area 207 of the H.263 video packet 201 also includes one of three H.263 payload headers, Mode A 203, Mode B 204, or Mode C 205. Finally, the H.263 compressed video bitstream 206 is included in the payload area 207 of the H.263 video packet 201.
In this prior approach, the shortest header mode (Mode A) is recommended for GOBs that are smaller than the network packet size. In this mode, a 4 byte payload header 203 is inserted prior to the H.263 compressed video bitstream. In this mode, each packet must start at the beginning of a GOB. That is, the H.263 bitstream is packetized only at GOB boundaries.
Modes B and C allow the H.263 bitstream to be fragmented at MB boundaries. However, extra information is employed at the start of a packet to recover the decoder internal state should a H.263 video packet 201 be lost during transmission. This extra information results in a size increase of the payload header from 4 bytes to 8-12 bytes. In this prior approach, modes B and C are recommended for transmitting GOBs whose sizes are larger than the maximum packet size allowed in the underlying protocol.
A number of disadvantages of the prior method exist within encapsulation. One limitation of the prior approach is that GOB headers are limited to predetermined spatial locations within the video bitstream. Further, while modes B and C allow fragmentation at MB boundaries, they do so at the expense of higher overhead.
In general, it is desirable to provide a more efficient method and apparatus for encapsulating and transmitting a video bitstream. More specifically, it is desirable to allow fragmentation of the video bitstream at MB boundaries with less overhead.
It is also desirable to jointly optimize the video bitstream layering and the process of segmenting the video bitstream into transport protocol packets. Specifically, it would be advantageous to provide a video bitstream syntax that allows dynamic insertion of synchronization information thereby creating a flexible video bitstream that may be efficiently packetized for a variety of transport protocols.