1. Field of the Invention
The present invention is drawn to optimally allocating bits among a plurality of data channels to be encoded by a plurality of encoders. The present invention utilizes a closed loop controller to optimally allocate bits among the plurality of channels by parsing a stream of complexity data generated by each encoder, and using the complexity data to determine the optimal number of bits to allocate to each channel. The complexity data of the present invention includes information about timing, frame type, and the bit requirements for every frame.
2. Discussion of the Related Art
The past decade has seen the rapid emergence of multi-media data transmitted across networks, and particularly the transmission of digital video to provide services such as video-on-demand, digital television broadcasts, and video content distribution. Digital video, however, contains an enormous amount of data in its raw or uncompressed state, making video compression both necessary and enabling for the delivery of video content. Further, content providers, publishers, or broadcasters often need to combine and send, or multiplex, multiple channels of compressed video data into a single output stream with a fixed bit rate in real-time or near real-time. Accordingly, there is a need in the art for systems, methods, and computer program products to optimally allocate a fixed number of bits among multiple channels of multi-media data.
Multi-media data takes many forms known in the art, including audio, picture, and video data. For example, picture data is stored as files of binary data using various raw or compressed formats including GIF, JPEG, TIFF, BMP, and PDF. Audio data includes waveform audio (WAV), MP3, audio interchange file format (AIFF), musical instrument digital interface (MIDI), and sound files (SND). Video data includes QuickTime and the Motion Picture Experts Group format (MPEG). Further treatment of the subject is provided in the book Video Communication. (1) Image and Video Compression Standards, V. Bhaskaran and K. Konstantinides, Kluwer Academic, 1995, the contents of which is hereby incorporated in its entirety.
FIG. 1 is a diagram illustrating an exemplary system for delivering multi-media data using networked computer systems. The computer systems 101, 102, 103 and networks 105 can be of the types described in the embodiment of FIG. 3, which is discussed in more detail below. On a network 105, a process called a client process (hereinafter, simply “client”) operating on one computer, called a client device, makes a request of another process called a server process (hereinafter “server”) executing on a computer, called a server device 103, connected to the network 105. The server 103 performs the service, often sending information back to the client.
A server device 103 contains multi-media data and a media transmission process 104 that communicates the data over a network 105 to the media server device 102. The media server device 102 includes a media server process 107 that conditions the data for communication over network 105 to a media presentation process 112 on media client device 101. The media presentation process 112 presents the multi-media data to a user.
In some embodiments, the local networks 105 may comprise a direct connection between media client device 101 and media server device 103. In other embodiments, the networks 105 include one or more transcoders that convert from one type of signal to another, encoders for compressing data, or multiplexers that combine a plurality of data channels into one output channel. In various embodiments, the networks 105 include one or more wired, wireless, or optical links.
Networks 105 can be networks that use the Internet Protocol (IP) in various embodiments. In other embodiments, networks 105 are both non-IP networks, such as a network of satellite communication links or cable television links. On a cable television link, the media server device 102 is at the cable headend and the media client device 101 may be a television set-top box or personal computer.
Video data can be encoded or transcoded into a variety of formats depending factors such as the computing resources need for real-time or near-real time encoding and delivery, storage limitations, bandwidth limitations, or media device 101 display limitations.
In some embodiments, video data is encoded into MPEG compatible data streams. MPEG is a video compression standard that defines operation of an MPEG video decoder and the composition of a MPEG stream. The video data within the MPEG stream represents a sequence of video pictures or frames. The amount of information used in MPEG to represent a frame of video varies greatly depending on factors such as visual content including color space, temporal variability, spatial variability, the human visual system, and the techniques used to compress the video data.
MPEG data may be encoded using three types of picture or frame data: Intra-frame (“I-frame”) data, forward Predicted frame (“P-frame”) data, and Bi-directional predicted frame (“B-frame”) data. I-frame data includes all of the information required to completely recreate a frame. P-frame data contains information that represents the difference between a frame and the frame that corresponds to a previous I-frame or P-frame data. B-frame data contains information that represents relative movement between preceding I-frame data or P-frame data and succeeding I-frame data or P-frame data. MPEG comprises various encoding standards, including MPEG 1, MPEG 2, and MPEG 4. MPEG 2 in defined in the international standard ISO/IEC 138181, 2, and 3, and these standards are herein incorporated by reference in their entirety.
MPEG reduces the number of bits required to represent video data by removing spatial redundancy within a video frame (intra-frame coding) and removing temporal redundancy between successive video frames (inter-frame coding). Each frame is made up of two interlaced fields that are alternate groups of rows of pixels. Each field is made up of multiple macroblocks (MBs). Each MB is a two dimensional array of pixels, typically 16 rows of 16 pixels. Each MB comprises four luminance blocks, typically 8 rows of 8 pixels each, and two chrominance blocks, also 8 rows of 8 pixels each. Motion compensation is used to reduce temporal redundancy, typically on a macroblock basis. Spatial redundancy is reduced using the Discrete Cosine Transform (DCT), typically on a block basis. During motion compensation, a motion vector is computed that indicates pixel locations on a reference frame that is the basis for a particular macroblock on a different, current frame. Differences between the reference macroblock and the particular macroblock are then computed using the DCT.
Each MPEG video sequence is composed of a series of Groups of Pictures (GOPs). Each GOP is composed of a series of I, P, and B frames, and each GOP begins with an I frame. As known in the art, a “slice” is a series of macroblocks and may make up a field or a portion of a field.
For decoding and display, the data in the MPEG stream is sent to a client computer system such as the computer system in the embodiment of FIG. 3. For example, the MPEG stream is sent over networks 105 to media device 110.
An MPEG stream must conform to certain criteria set forth in the MPEG standards. For example, the MPEG stream may provide 30 frames per second but should not provide so many bits per second that a client computer's buffers overflow. Buffer overflow can be mitigated by requiring that the received MPEG stream be of a constant or fixed bit rate. MPEG data channels may also contain variable bit rate streams, wherein a set of frames such as a GOP is encoded using a variable number of total bits.
Often, entities such as content providers, distributors, and publishers need to combine a plurality of MPEG compatible streams and transmit the output over a fixed bandwidth channel. In this case, the individual bit rates of the channels are free to change as long as the total bit rate is fixed. FIG. 2 provides a logical overview of a group of encoders 201, 202, 203 encoding input streams to produce a plurality of variable bit rate (VBR) output streams, which are then each input to multiplexer 204 (“mux”). The multiplexer 204 accepts the input streams, buffers data, and sends out the desired constant bit rate (CBR) data stream. Accordingly, a challenge exists in the art to optimally assign bit rates to each encoder so that the multiplexer can deliver the output stream at a constant bit rate without overflowing buffers, while at the same time optimally assigning bits to each channel to provide the best possible output quality.