Audio-visual information can be digitized, compressed, and converted to formats suitable for computer and digital communication network transport and storage. Digital video compression techniques are widely recognized as an effective way to reduce the amount of data required to represent a video signal. Typical compression and delivery standards, such as Moving Picture Experts Group (MPEG) have an adjustable level of video quality and data usage trade-off. In addition, a compressed video signal can be treated just like other data that represents text, images, audio, database records, etc. As a result, multiple data types can be transported together over computer and communication networks and can be presented digitally to personal computers, televisions, or other display devices.
One problem with delivering compressed video is ensuring proper presentation of the video signal. Although compressed video can be delivered concurrently with other data, certain timing requirements are needed to ensure proper display of the video or audio-visual information. Because of the real-time nature of a video signal, video frames are presented to the decoder and display device in a periodic fashion. Further, compressed digital video data, when measured in bits per second (i.e., bit rate), has a highly variable nature. Specifically, the digital compression process removes temporal and spatial redundancy from the video images. As an example, in the MPEG video compression format, a video signal is encoded into three types of frames: I (intra), P (predictive) and B (bi-directional) frames. An I frame video picture is encoded as a stand-alone image, P frames or B frames are encoded via the motion compensation method, in which only the difference between video frames are encoded. Further details of the compression process can be found in MPEG standards documents, for example, “Generic coding of moving pictures and associated audio information: Video,” available from the International Organization for Standardization (ISO/IEC), document no. 13818-2:2000. As a result of the compression process, video frames are encoded into data units with different sizes. Typically, I frames have more data than P or B frames. Therefore, if the compressed video data is transmitted over a communication channel at a constant bit rate, the encoded video frames will arrive at the decoder in non-uniform time intervals.
One approach for ensuring proper presentation of the video signal is using a data buffer to absorb the variation in size of encoded video frames. To complement the compression and encoding standards, decoder standards have been established to ensure that a compliant decoder can process video data (i.e., a bitstream) that complies with the MPEG specification. For example, in the MPEG digital video specification, the standards-compliant decoder has some precisely defined parameters, such as buffer size, how coded video frames are delivered to and from the buffer, how coded video frames are extracted for display, and how the decoder constructs the local timing clock. Defining parameters such as buffer size and timing ensures that application specific integrated circuits (ASICs) designed to decode compliant bitstreams are interoperable. Buffer exceptions, however, are to be avoided for proper decoder operation. A buffer exception typically includes buffer overflow or underflow conditions, which result in dropped frames or corrupted data. For proper presentation, therefore, the video data arriving at the decoder needs both data integrity and time integrity to avoid causing buffer exceptions.
Other problems such as latency are associated with extensive data buffering. As described above, video picture data is properly decoded and presented at regular intervals (e.g., 30 frames per second in NTSC television format). Buffering can be required at both ends of the communication channel to ensure proper presentation. The amount of buffering can depend on bit rate variation, such as the variation of the data size for each coded video frame described above, and the amount of buffering can be different for different bitstreams. For example, if one variable bit rate (VBR) transport stream has a duration of 100 minutes, with an average bit rate of 4 Mbps, a constant bit rate (CBR) communications channel with 4 Mbps data throughput can transport the video signal provided buffering is used to absorb the bit rate fluctuation from average. Specifically, if the first 50 minute segment of the 100 minute video is at 7 Mbps and the second 50 minute segment of the video is at 1 Mbps, the average bit rate is 4 Mbps. But if a 4 Mbps channel is used to transmit the bitstream, for the first 50 minutes, the buffer before the channel transmission would have to be at least 1.125 Terabytes ((7 Mbps-4 Mbps)*50 minutes*60 seconds). Further, the decoder buffer at the far end of the channel expects 30 coded frames to arrive but will not receive all of them because they have not all arrived yet in the deep buffer. A long initial delay is therefore required to avoid a decoder buffer exception.
Another problem with compressed video delivery is providing decoder compliant bitstreams. A bitstream that does not cause buffer exceptions at a standards-compliant decoder is called a decoder compliant bitstream. Network transport characteristics become a factor that can affect the compliance of the compressed video bitstream arriving at the decoder. That is, variable delay in the communications system or other data transport problems can transform a bitstream that is decoder compliant at the near end of the communications channel into a non-compliant bitstream at the far end (e.g., an integrated receiver/decoder).
A further problem with delivering a decoder compliant bitstream occurs when a multiple program transport stream is demultiplexed. For example, in MPEG-2, a typical multiple program transport stream combines several CBR or VBR packetized elementary streams (PES). The data associated with each individual PES bitstream is distinguished by a packet identifier (PID) in the composite transport stream. The use of the PID field allows a transport stream of different programs to be logically separated from each other, yet multiplexed temporally into a single transport stream. A statistical multiplexer (statmux) can be used to allocate transport bandwidth efficiently among VBR streams. For example, if video program A contains a flat background, while at the same time video program B contains high-motion content with more textural details, more bits will be used to code frames from program B than frames from program A. As a result, if one considers video program A or B by itself, the resulting MPEG-2 transport stream will be VBR. A PID stream that is extracted out of a statmuxed transport stream is decoder compliant as long as packet presentation timing is not changed. Timing implies that the inter-arrival times between any two consecutive transport packets of the same PID must be the same whether they are part of the statmuxed transport stream or being extracted out of the statmuxed transport stream and presented separately. When the statmuxed transport stream is delivered to a decoder, the decoder selects data packets having a particular PID (or set of PIDs) and discards or ignores other packets. When a PID stream has been extracted or delivered separately from a multiplexed transport stream, however, the packet delivery timing may be no longer the same. For example, these packets may be uniformly presented by the multiplexing process, but after demultiplexing, there may be some intermediate buffering that causes the packets to be delivered in a bursty fashion to the decoder. Bursty delivery can cause undesirable buffer exceptions.
FIG. 1 further illustrates the problem. In FIG. 1, a typically statistically multiplexed transport stream 120 is shown. Packets corresponding to video 1 105, video 2 110, and video 3 115 are illustrated as part of a multiple program transport stream 120. Packets that correspond to the packet identifier (PID) of video 1 105 are extracted from the statmuxed transport stream 120. The relative timing of packets from the same video program is shown to be variable 125. This relative timing is called the packet schedule. The packet schedule reflects the variable nature of the encoded video data stream. However, extracting video 1 105 and transporting it, for example, over a CBR channel as a separate bitstream 130 can make the packet inter-arrival times or “spacing” uniform, which changes the packet schedule or presentation timing. More generally, any transport of video 1 105 may not necessarily preserve the original packet schedule either due to additional buffering or further multiplexing with other packets in a different network node or system entity. As described above, the changed packet schedule can cause buffer exceptions at the decoder.
One conventional approach to preserving the relative timing of data packets within a bitstream when demultiplexing is to replace data packets in the multiple program transport stream that are associated with other bitstreams with null packets to maintain the relative timing of the extracted bitstream. That is, the null packets take up space in the resulting transport stream to ensure proper arrival timing of the data packets at a decoder or other system entity. Null packets, however, consume additional resources and introduce further content storage or delivery inefficiencies. For example, when extracting a VBR bitstream having an average bit rate of 3.5 Mbps and a peak bit rate of 9 Mbps, inserting null packets to ensure proper timing results in a bitstream having an average bit rate of 9 Mbps. Quantitatively, this represents an inefficiency of about 61%.
Another conventional approach to preserving the relative timing of the data packets within a bitstream when demultiplexing is to append an additional timestamp to each packet. Specifically, a timestamp that describes the correct current time instance of the packet can be coded as an additional field and added to the beginning (or the end) of the packet. These timestamps can then be delivered together with the packet through the network. A downstream system entity needs to be able to recover the timing of the packet. The system entity can do so by inspecting the timestamp on each packet. This approach is more efficient than the null packet insertion approach described above because a timestamp generally takes fewer bits than the original packet. For example, for an MPEG-2 transport packet, which includes 188 bytes per packet, a timestamp can be a 42-bit field. A 42-bit timestamp can be described by a 6 byte data field (rounding to 8-bit byte boundary preserves the byte boundary for the entire packet with the timestamp). Therefore, the approach would have an inefficiency of about 6/(188+6)=3%. This approach does have an additional limitation: the resulting packet format is no longer compatible with the original packet format. In the case of an MPEG-2 transport packet, the original 188 byte packet format is extended to a 194 byte packet format, which is not a standardized packet format. Unless all network transport systems are manufactured by the same vendor or support the same packet formats, the non-standard transport stream can no longer be transported over the network. Therefore, this approach may be useful for local storage and time reconstruction but not for interoperable long distance transport.
What is therefore needed is a packet schedule timestamp that: (1) encodes the bit rate profile or relative timing of data packets within a bitstream; (2) ensures timely delivery and presentation of a bitstream without long latency or buffer exceptions; and (3) preserves the relative timing of data packets within a bitstream without introducing inefficient null packets or an incompatible packet format.