The present invention relates generally to digital data processing of video information. More particularly, the invention provides techniques, including methods and systems, for encoding video information.
The video information is usually transmitted from a video provider to a video user over a transmission system, such as a transmission line or a wireless network. The video information includes motion pictures, live news broadcasts, and real-time information updates. The video provider can be, among others, an Internet service provider, a videoconference participant, an entertainment service provider, or a news agency. The video user is, among others, an Internet service subscriber, a videoconference participant, a video-enabled cellular phone user, a CODE C user, a PALM user, or a user of any other types of handheld devices that possess the video capability.
The transmitted video information usually comprises thousands of frames, each of which is composed of numerous microblocks. Each microblock usually represents one pixel of the image illustrated in a frame, and comprises multiple bits of data. During the transmission, each frame lasts from about 25 to 30 msec in order to generate a continuous motion effect for the video viewer. The unencoded video information hence needs an average transmission rate of millions of bits per second. In addition, because various frames may contain drastically different amount of data, the required transmission rate could vary between different frames.
In contrast, at the hardware level, the transmission rate of the transmission system is usually much lower than millions of bits per second, and it usually does not vary with time. This limited transmission rate cannot typically support the unencoded transmission of the video information; therefore the video data must be encoded in order to reduce the required transmission rate.
The required transmission rate of the encoded data on the average needs to be equal to or smaller than the transmission rate of the transmission system. However, if the video date are overly compressed, video images that the viewer sees after the transmission may not possess a satisfactory quality. Furthermore, if the required transmission rate for the encoded video data drops below the transmission rate of the transmission system, the unused transmission capacity is filled with meaningless information, and is wasted. Hence, the optimization of encoding parameters is critical for transmitting high quality video images with the limited transmission rate.
The search for the optimal encoding parameters is complicated by the significant variation in data size of various frames. With current encoding schemes, encoded video data have file sizes that are not strictly proportional to the number of microblocks in each frame, but rather proportional to the spatial complexity and motion activity of the image. The various frames, if encoded with the same set of parameters, may require different transmission rates, but the transmission rate of the transmission system is fixed. Therefore the optimal encoding parameters vary frame by frame.
One method for determining a set of encoding parameters for each frame includes a buffer that stores the encoded data prior to their transmission over the transmission system. When the incoming rate of the encoded data is higher than the transmission rate of the transmission system, the buffer stores the incoming data. As the buffer fills, the encoding parameters are adjusted so that the transmission rate for the subsequent encoded data is reduced. If the encoding parameters are not adjusted, the buffer will eventually overflow, and consequently the incoming data would be irretrievably lost. On the other hand, when the coming rate of the encoded data falls below the transmission rate of the transmission system, the buffer maintains its output rate at the transmission rate of the transmission system. Similarly, the encoding parameters are adjusted to raise the transmission rate of the future subsequent encoded data. If the encoding parameters are not adjusted, the filled percentage of the buffer would drop below a predetermined level, and would have to be filled with bits of random data to prevent the crash of the encoding system. In short, the adjustment triggered by the buffer can compensate the adverse effects resulting from buffer underrun and overrun.
A drawback to this method is that the buffer cannot completely solve the problem associated with the lack of optimization. The adjustment of the encoding parameters based on the buffer fullness is predicated upon the dubious assumption that future frames have similar complexity as previous frames. In reality, this inference of future behavior based on recent historical data is often incorrect. Under this assumption, a simple image following a complicated frame would be overly compressed at the expense of image quality, and a complex image following a simple frame can be under compressed and thus fills up the buffer.
Another drawback is that large buffers are required. In particular, a very large buffer may be used to allow a significant rate mismatch between the required encoded data transmission rate and the transmission rate of the transmission system over an extended period without using a complicated rate control algorithm. However, the large buffer increases the cost of the transmission system, and introduces a great deal of testing. Such added latencies delay are unsuitable, especially for communications between parties responding to each other interactively, such as for a video conferencing system.
Some other methods for solving the rate mismatch problem include making adjustments based on historical data and the fullness of the buffer, however, this solution cannot solve the problems related to the lack of intra-frame optimization. For example, under these methods, the modifications to the encoding parameters often create a cyclic effect undesirable for videoconferencing. In a videoconference, the scene is usually composed of a very simple background at the top of the frame, talking heads with increased motion in the center, and hands manipulating objects on a table top at the bottom of the frame. As an encoder processes the simple background at the top of the frame, the compression becomes more and more compact in order to compensate for the accumulated buffer fullness. By the time the table top is reached, the objects to which the speaker is drawing the viewer's attention are overly compressed, and are poorly represented.
In order to solve the problems related to the lack of the optimization, past researchers have suggested various methods, such as MPEG-4 Standard (See IISO/IEC FDIS 14496-2, Annex L), for analyzing the date content of the video frames before they are encoded. However, these methods require complex mathematical operations, and are not suitable for fast encoding on a low-speed microprocessor.
In summary, highly desirable is a predictive encoding technique that, based on the complexity of the future frame, adjusts the encoding parameters and utilizes only simple mathematical manipulations.