1. Field of the Invention
The present invention relates to transmission of video and audio, and more particularly, to transmission of variable bit rate video and audio over asynchronous transfer mode networks.
2. Background of the Art
To store and deliver digital video in a cost effective fashion, an application must have the capability to compress the number of bits in the video. This is, in part, because the available bandwidth for delivery of digital video is usually far narrower than that required for delivery of uncompressed video. For example, the National TV Standardization Committee (NTSC) describes a broadcast quality video as one that has a resolution of 720 Horizontal by 480 Vertical pixels/frame and transmission rate of 29.97 frames/second. As a result, a broadcast quality video typically generates a bit rate in the range of 100 to 300 Mega bits/second. The addition of audio to the video further increases the bit rate of the uncompressed stream. Therefore, at a bit rate in the range of 100 to 300 Mega bits/second, 15 seconds of uncompressed video clip could occupy up to 575 Mega bytes of disk space, which is far too much for most desktop computers to dedicate to such a short video clip.
An encoder follows a set of steps, known as an encoding process, to compress a digital video and/or its associated audio. This process, however, is not standardized and may vary from application to application depending upon the application's requirements and the complexity of the video. As a result, an encoder can optimize the encoding process to meet the specific bandwidth and video quality requirements of an application. Typically, a broadcast quality compressed video may have a peak rate of 1.5 Mega bits/second to 15 Mega bits/second.
The Motion Picture Experts Group (MPEG) has defined standards, known as MPEG-2, which are described in "ISO/IEC 13818, ITU-T," 1994. MPEG-2 describes the syntax and the semantics for decoding encoded digital video and audio. MPEG-2 defines five profiles and four levels. Each profile describes the basic configuration and/or complexity of the encoding and decoding methods, and each level classifies image sizes and identifies the ranges of parameters for controlling the encoded video bit rate. Of the five profiles and the four levels, the combination of a main level and main profile defines a broadcast quality video. For this combination, an encoder can vary the bit rate by dynamically adjusting the number of bits per frame.
An MPEG-2 encoder can encode a video by exploiting the spatial and temporal redundancies of frames in the video. The encoder uses the spatial redundancy of a frame to encode the frame independent of any other frame in the video. In addition, the encoder provides further compression by taking advantage of the fact that consecutive frames of a video are often similar to each other. To take advantage of this temporal redundancy, the encoder simply encodes the difference between two or more consecutive frames instead of encoding each frame separately.
Specifically, an MPEG-2 encoder encodes each video frame into one of three types of frames: Intracoded (I) frames, Predictive Coded (P) frames, and bidirectionally predicted (B) frames. The encoder compresses a frame into an I frame by using a Discrete Cosine Transform (DCT). The encoder compresses the frame without taking into account the redundancy, if any, between the frame and its adjacent frames. The encoder compresses a frame into a P frame, however, by encoding the difference between the frame and its previous frame. Similarly, the encoder compresses a frame into a B frame by encoding the difference between the frame and its previous and next frames. The encoder then groups several I, B, and P frames into a set pattern to form a Group of Pictures (GOP).
Commercially available MPEG-2 encoders support a wide range of transmission and storage applications by using both constant bit rate and variable bit rate encoding. A constant bit rate encoder uses a rate control buffer, known as a video buffer verifier (VBV), to maintain a constant bit rate at the output of the encoder. The fullness of VBV dictates the quantization level and hence the number of bits per frame used in macro-block and slice layers of the encoding process. The encoder maintains a constant bit rate at its output irrespective of the amount of bits needed for encoding each frame of the video. Specifically, to achieve a nearly constant bit rate, the encoder varies the quantization resolution of the video based on the complexity of the video and the fullness of the VBV. Furthermore, to achieve an exact constant bit rate, the encoder may need to perform bit stuffing when the encoder generates a smaller number of bits than the desired constant bit rate.
Constant bit rate encoding has two notable disadvantages. Because of varying quantization, the resulting encoded video generally does not have constant quality throughout the video. More importantly, in transmission applications over a network, an application can not make efficient use of the available network bandwidth and switch buffer space since the encoder may generate excess traffic due to unnecessarily high quantization resolution and/or bit stuffing.
Variable bit rate encoding is commonly used in storage applications such as Digital Versatile Discs (DVDs). Unlike a constant bit rate encoder, a variable bit rate encoder generates a variable bit rate encoded video at the output of the encoder. The bit rate of the encoded video depends on the complexity of each scene, the degree of motion, and the number of scene changes. As a result, a variable bit rate encoded video generally is bursty in nature. The bursty nature of a variable bit rate encoded video can result in an inefficient use of network bandwidth and switch buffer space resources in transmission applications. To achieve constant video quality and to maximize efficient use of network bandwidth and switch buffer space, variable bit rate video applications must control the burstiness of variable bit rate encoded videos.
Known methods for controlling the burstiness of encoded variable bit rate video are 1) source rate control, 2) network feedback rate control, and 3) encoder output shaping. The source rate control method sets the quantizer scale globally at the frame level or lower, and adjusts the encoding bit rate by varying the quantization scale of different frame types in the video. This method has the disadvantage that, in transmission applications, the encoder must re-encode the entire video to conform the bit rate of the video to the network traffic parameters each time an application wishes to transmit the encoded video under a different network traffic conditions, and thus, requires substantial re-processing.
The network feedback rate control method uses feedback information (in the form of signaling information) from the network to readjust the bit rate of the encoded video. The network feedback information generally identifies the availability of network bandwidth and switch buffer space. Because of time delay in receiving the feedback information from the network, this method is not as effective as the source rate control method for controlling the burstiness of a variable bit rate video. Furthermore, this method requires implementation of additional network signaling information, which increases implementation costs.
The encoder output shaping method adjusts the bit rate of an encoded video to conform to network bandwidth and switch buffer space availability. This method does not have the disadvantages of the source rate control because an encoded video can be transmitted to multiple destinations and under different network traffic conditions without re-encoding the video. Although the general concept of shaping an encoded video is well known, methods and/or systems for shaping an encoded variable bit rate MPEG-2 video for transmission in an asynchronous transfer mode (ATM) network are not known.
In an ATM network, a source node transmits information in the form of fixed sized cells to a destination node through a connection (referred to as a virtual circuit) established between the source node and the destination node. The source node and destination node may be a set-top-box, video equipment, facsimile, computer, edge-router, edge-switch, etc. The cells may include any type of digitized information, including video, audio, data, multimedia, etc.
When establishing a virtual circuit through an ATM network, a source node can select one of five different categories of service: Constant Bit rate (CBR), Variable Bit Rate--Real Time (VBR-RT), Variable Bit Rate--Non Real Time (VBR-NRT), Available Bit Rate (ABR), and Unspecified Bit Rate (UBR). "ATM Traffic Management Specifications v. 4.0," 1996, describes each of these services.
A source node negotiates a traffic contract for a CBR connection to a destination node by specifying a peak cell rate (PCR), the maximum cell rate that the source node can transmit on the connection. CBR service is ideal for applications where the source node generates a constant rate video. However, CBR service is not well suited for variable bit rate video applications because these applications, due to their bursty nature, do not transmit at the negotiated PCR during the entire duration of the video. As a result, these applications do not use the entire bandwidth that the network allocates to a CBR connection. Thus, the network cannot efficiently allocate network bandwidth resources to variable bit rate video applications that use CBR connections.
VBR service, however, is better suited for efficient allocation of network bandwidth and switch buffer space resources to variable bit rate video applications. Specifically, a source node negotiates a traffic contract for a VBR connection to a destination node by specifying a PCR, sustained cell rate (SCR), and a maximum burst size (MBS). SCR is the average cell rate that the source node can transmit on the connection. MBS is the maximum number of consecutive cells that a source can transmit at PCR on the connection.
In a VBR connection, the cell rate can exceed SCR for short periods constrained by MBS, but the connection maintains the SCR as the average rate. A VBR connection provides a guaranteed quality of service regarding cell loss and bandwidth availability as long as the cell traffic meets the negotiated traffic contract.
For transmission of already encoded variable bit rate video on a VBR connection, the source rate control method has the disadvantage that the encoder must re-encode the entire video to conform the bit rate of the video to the traffic contract parameters PCR, SCR, and MBS each time an application wishes to transmit the encoded video, and thus, requires substantial re-processing.
Likewise, the use of network feedback rate control method for transmitting an encoded variable bit rate video on a VBR connection has the disadvantage that an application must re-negotiate the traffic contract parameters PCR, SCR, and MBS during the life of a VBR connection. Furthermore, because of the time delay in receiving feedback information from the network, the application cannot effectively control the burstiness of a variable bit rate video. Finally, the network feedback rate control method requires implementation of additional ATM network signaling information, which increases implementation costs.
Therefore, it is desirable to have a method and system for transmitting an encoded variable bit rate MPEG-2 video on a VBR connection in an ATM network that does not have the above-mentioned disadvantages. Furthermore, it is desirable to maximize the number of video streams that the network can support for a given network bandwidth, switch buffer space, and video quality.