The present invention relates to video compressors and, more particularly, to video compressors that produce motion-vector based predictive picture blocks without searching previous or subsequent frames for reference blocks.
Uncompressed digital video signals for televisions, computer displays and the like can involve high data rates (bit rates). For example, a digitized typical National Television System Committee (NTSC) television signal has a data rate of about 165 megabits per second; an uncompressed high-definition television (HDTV) signal has an even higher data rate. To reduce these data rates, digital video signals are commonly compressed. Exemplary digital video compressing standards include Motion Picture Experts Group 1 (MPEG-1), MPEG-2, MPEG-4, Windows Media Video and QuickTime.
Compression techniques used with these and other standards exploit spatial and temporal redundancies typically present in uncompressed video signals to avoid sending redundant data. In other words, compression techniques take advantage of the fact that many parts of a typical video frame are identical or similar to other parts of the same frame or from frame to frame. To reduce temporal redundancy, these techniques send references to similar or identical, previously-sent portions of the video signal. Compression by reference to other portions of a video signal is commonly referred to as “motion-compensated inter-frame prediction.” These techniques are, however, compute-intensive. Consequently, conventional hardware compressors are very expensive and software compressors are slow.
A video signal often consists of a series of pictures (“frames”) displayed in rapid succession. A typical system displays between about 25 frames per second and about 75 frames per second. Each frame consists of a rectangular array of picture elements (“pixels”). For example, High Definition NTSC signal can contain 30 frames per second, each frame representing 1,080 rows (“lines”) of pixels, with 1,920 pixels per line. These frames need not, however, be sent in the order in which they are to be displayed. That is, a compressor can send the frames slightly out of sequence. A decompressor generally includes a buffer to store received frames, and the decompressor rearranges the frames before displaying them. Sending the frames out of sequence can facilitate compressing and decompressing the video signal, as discussed below.
When attempting to compress a frame, a typical compressor operates on rectangular arrays of pixels, such as 16×16 pixel arrays (“macroblocks”), within the frame. For each macroblock, the compressor attempts to exploit temporal redundancy in the original video signal. Specifically, the compressor searches for a macroblock in one or more recently-sent frames whose contents are similar or identical to the macroblock that the compressor is attempting to compress. If the compressor finds one or more such macroblocks in one or more such frames (each such frame is referred to as a “reference frame”), the compressor replaces the macroblock that is being compressed with a reference to the found macroblock(s). Each found macroblock is referred to as a “reference macroblock.”
If the compressor finds a reference macroblock in a reference frame, but the reference macroblock is not exactly identical to the macroblock that is being compressed, the compressor can include correction information (an “error term”) that the decompressor can use to correct the macroblock. A reference to the reference macroblock, even if it includes the optional error term, requires fewer data bits than would be otherwise required to store the contents of the macroblock.
If the compressor cannot find a suitable reference macroblock, the compressor sends the contents of the macroblock that the compressor is attempting to compress. In addition, periodically the compressor unconditionally sends the contents of a full frame, without references to other frames.
The MPEG standards define an encoding methodology that is used when the contents of a macroblock are to be sent. This methodology includes a discrete cosine transform (DCT), quantization and entropy coding.
As noted, compression by reference to other macroblocks is referred to as motion-compensated inter-frame prediction. “Prediction” in this context refers to a process of ascertaining how a macroblock will appear, in relation to another, previously-sent one or more macroblocks. Sending this information to a matching decompressor enables that decompressor to regenerate a correct macroblock from information in the other frame(s).
These predictions are possible due to redundancies, from frame to frame, in most video content. For example, if a video signal depicts a moving object in front of a stationary background (relative to the frame boundaries), many of the macroblocks that depict the background may remain relatively constant from frame to frame. On the other hand, from frame to frame, a different set of macroblocks, i.e. macroblocks having different coordinates within the frame, depict the moving object. However, the contents of the macroblocks that depict the object may remain relatively constant from frame to frame; only the locations of these macroblocks within the frames change. Thus, macroblocks that represent background in a given frame can be compressed by referring to previously sent macroblocks, and macroblocks that represent the moving object can be compressed by referring to previously sent macroblocks that represent the object when it was located at a different location within the frame.
If a reference macroblock is not located in the same position within the reference frame as the macroblock that is being compressed is located within its frame, the compressor includes a “motion vector” in the encoded video signal. The motion vector specifies how the macroblock in the reference frame should be repositioned before it is inserted into a reconstructed frame by the decoder. Continuing the previous example, assume the object can be depicted by a 3×4 set of adjacent macroblocks, i.e., by 48×64 pixels. In addition, assume the same 3×4 set of macroblocks can be used to depict the object in a sequence of frames, except, because the object is moving within the frame, each macroblock that depicts the object appears at a different location within each of the frames. Thus, all the frames in the sequence can reference the same set of reference macroblocks to depict the object; however, in each of these compressed frames, different motion vectors are used, so the decompressor can ascertain where to depict the macroblocks in each of the reconstructed frames.
As noted, a reference frame can be one that is to be displayed before or after a frame that is being compressed. If the reference frame is to be displayed before the frame containing a macroblock that is being compressed, the compressed macroblock is said to be “forward predicted.” On the other hand, if the reference frame is to be displayed after the frame containing the macroblock that is being compressed, the compressed macroblock is said to be “backward predicted.”
If the compressor identifies two different reference frames, i.e., one before and the other one after the frame that is being compressed, each reference frame containing a macroblock that is similar to the macroblock that is being compressed, the compressor sends information about the macroblocks in both reference frames, and the decompressor essentially averages the two reference macroblocks. In this case, the compressed macroblock is said to be “bidirectionally predicted.”
Prior-art video encoders produce compressed frames by referring to other frames. FIG. 1 is a block diagram of a prior-art video encoder 100 in operation. The video encoder 100 receives uncompressed video contents 102 (such as uncompressed video frames) and produces compressed video 104 (such as compressed video frames). The video encoder 100 includes a memory buffer (not shown), in which the video encoder stores recently encoded frames. While encoding a macroblock of a predicted frame, the video encoder 100 searches the previously encoded frames for macroblocks that contain similar or (ideally) identical contents, so the compressed video 104 can contain references (motion vectors) to similar or identical elements in the previously encoded frames. Searching the previously encoded frames for a suitable reference block that requires a small error term is compute-intensive. The compressed video 104 includes standard encoded frames, forward predicted macroblocks, backward predicted macroblocks, error terms, etc.
As noted, compression is a compute-intensive operation. For each macroblock that is to be compressed, the compressor typically searches a large number of macroblocks in many frames to find the most suitable reference macroblocks, i.e., the macroblocks that require a small error term and that yield good compression and good picture quality. The compressor chooses among the prediction modes described above (forward predicted, backward predicted or bidirectionally predicted), as well as other modes (not described here). The quality of the search algorithm and the amount of compute power available to the compressor influence the quality of the resulting decompressed video.
Some compressors require dedicated hardware. On the other hand, software compressors can be executed by general purpose processors, and some software compressors can operate as fast as, or faster than, real time. In either case, high-quality compression requires a sophisticated, and therefore expensive, platform.