With the advent of digital video products and services, such as Digital Satellite Service (DSS) and storage and retrieval of video streams on the Internet and, in particular, the World Wide Web, digital video signals are becoming ever present and drawing more attention in the marketplace. Because of limitations in (i) digital signal storage capacity, (ii) in network and broadcast bandwidth, and (iii) in client computer system process bandwidth, compression of digital video signals has become paramount to digital video storage and transmission. As a result, many standards for compression and encoding of digital video signals have been promulgated. For example, the International Telecommunication Union (ITU) has promulgated the H.261 and H.263 standards for digital video encoding. Additionally, the International Standards Organization (ISO) has promulgated the Motion Picture Experts Group (MPEG), MPEG-1, and MPEG-2 standards for digital video encoding.
These standards specify with particularity the form of encoded digital video signals and how such signals are to be decoded for presentation to a viewer. However, significant discretion is left as to how the digital video signals are to be transformed from a native, uncompressed format to the specified encoded format. As a result, many different digital video signal encoders currently exist and many approaches are used to encode digital video signals with varying degrees of compression achieved.
In general, significant degrees of compression of motion video signals achieved by today's motion video signal encoders force a compromise between encoded signal quality and bandwidth consumed by the encoded video signal. Specifically, encoding a motion video signal such that more of the original content and quality of the motion video signal is preserved consumes additional bandwidth. Conversely, encoding a motion video signal so as to minimize consumed bandwidth generally degrades the quality of the encoded motion video signal.
In current systems for delivering motion video signals to a client computer system for display for a user, there are primarily two types of bandwidth which are limited. The first is delivery bandwidth, i.e., the bandwidth of the transmission medium through which the encoded motion video signal is delivered to the client computer system. The second is client processing bandwidth, i.e., the amount of processing capacity which is available to decode the encoded motion video signal within the client computer system. As greater and greater degrees of compression are realized by today's motion video signal encoders, greater and greater processing capacity is required by client computer systems which decode these encoded motion video signals. In general, such client computer systems must decode and display as much as thirty frames per second. If the available processing bandwidth is exceeded, some or all of the sequence of video images are lost and, therefore, so is the integrity of the motion video signal. If an encoded motion video signal errs on the side of conserving processing bandwidth, the quality of the motion video image can be compromised significantly.
The format of H.263 encoded digital video signals is known and is described more completely in "ITU-T H.263: Line Transmission of Non-Telephone Signals, Video Coding for Low Bitrate Communication" (hereinafter "ITU-T Recommendation H.263"). Briefly, a digital motion video image, which is sometimes called a video stream, is organized hierarchically into groups of pictures which includes one or more frames. Each frame represents a single image of a sequence of images of the video stream and includes a number of macroblocks which define respective portions of the video image of the frame. An I-frame is encoded independently of all other frames and therefore completely represents an image of the sequence of images of the video stream. P-frames are motion-compensated frames and are therefore encoded in a manner which is dependent upon other frames. Specifically, a P-frame is a predictively motion-compensated frame and depends only upon one I-frame or, alternatively, another P-frame which precedes the P-frame in the sequence of frames of the video image. The H.263 standard also describes BP-frames; however, for the purposes of description herein, a BP-frame is treated as a P-frame.
All frames are compressed by reducing redundancy of image data within a single frame. Motion-compensated frames are further compressed by reducing redundancy of image data within a sequence of frames. Since a motion video signal includes a sequence of images which differ from one another only incrementally, significant compression can be realized by encoding a number of frames as motion-compensated frames, i.e., as P-frames. In addition, reconstructing motion-compensated frames represents a significant portion of the processing required to decode an encoded motion video signal.
In motion estimation, each macroblock of a frame is compared to a number of different equivalent-sized portions of a previous frame. In an exhaustive motion estimation search, a macroblock, which represents a 16-pixel by 16-pixel block of a frame, is compared to every possible 16-pixel by 16-pixel block of a previous frame, even blocks which are not aligned on macroblock boundaries. In pursuit of even better motion estimation, some systems interpolate half-pixels between pixels of the previous frame and compare each macroblock to each 16-pixel by 16-pixel block of half-pixels of the previous frame. However, exhaustive searching of every possible 16-pixel by 16-pixel block of pixels and every possible 16-pixel by 16-pixel block of half-pixels is, in terms of computation and processing resources, prohibitively expensive.
Accordingly, conventional motion estimation systems try to derive a motion vector between a macroblock and a 16-pixel by 16-pixel block of a previous frame without exhaustively comparing all possible blocks. One such system is the "three-stage log search plus half-pixel." However, encoding macroblocks to include motion vectors to 16-pixel by 16-pixel block of half-pixels requires substantial processing by a decoder of a client computer system which reconstructs the macroblocks from the motion vectors since such a decoder must re-derive the half-pixels in decoding the macroblocks. Client computer systems in which decoders typically operate are generally smaller, slower, less expensive computer systems and therefore have less processing capacity than computers in which encoders operate. In addition, each frame must be decoded within a particular amount of time to display successive frames quickly enough that the viewer perceives motion in the video image. Additional processing requirements introduced by half-pixel encoding can push the aggregate processing requirements of motion video image decoding beyond the capability of the client computer system.
What is needed is a motion video signal encoding mechanism in which the benefits of half-pixel motion estimation are realized while minimizing the processing burden imposed upon the decoder of the client computer system.