This invention relates to video compression and, more particularly, to the detection and generation of motion vectors which describe the movement of video picture information from one video frame to another, namely rotational and zoom movement.
The transmission and recording of video information, such as television signals, in digital form generally requires a large amount of digital information to assure the accurate reproduction of high quality video pictures. As an example, if each video frame of picture information is digitized, approximately 250 Mbps are needed. With the advent of high definition television (HDTV), the amount of information needed to reproduce an HDTV picture is significantly increased. In addition, proposed digital standards for high fidelity sound reproduction are expected to be satisfied by still additional digital data, resulting in an anticipated transmission (or data transfer) rate in a digital video system to be on the order of about 1000 Mbps.
The foregoing data requirements have made it impractical to digitize, on a frame-by-frame basis, all of the video information included in each frame of a video picture. Moreover, and recognizing that the video information contained in one video frame of a particular scene is quite similar to (and in many cases almost identical to) the video information included in an immediately preceding frame, it is appreciated that the complete digitization of a video frame consists, to a large degree, of redundant data. This redundancy suggests that a substantial savings in bandwidth and data transfer rate can be realized by relying upon data compression techniques. Two types of video compression processing have been proposed heretofore: intraframe compression, wherein the spatial redundancy within a given frame of video information can be exploited to reduce the amount of digital data needed to represent that frame; and interframe compression wherein the redundancy of information from one frame to the next can be exploited so that only digital data representing changes need be transmitted.
Various mathematical models have been proposed for eliminating spatial redundancy in a given frame. One technique which has proved to be quite successful and has been implemented by digital processing relies upon orthogonal transformation of the video information included in a video frame; such as discrete cosine transformation (DCT). As is known to those of ordinary skill in the art, DCT processing is carried out by segmenting a video frame of digitized video information, such as pixels, into blocks formed of n.times.n arrays of pixels and then taking the discrete cosine transformation of each block. DCT coefficients of different frequency components are produced and only those coefficients which exceed a threshold level are processed further. This results in a significant reduction of data needed to represent a video frame, with only a small sacrifice in picture quality that is not easily perceived by a viewer. Further compression is achieved by quantizing these DCT coefficients and then relying upon variable length encoding, such as Huffman coding, for still further data reduction or compression. As a result, the amount of data needed to represent a frame of video information, such as an HDTV frame, is significantly reduced.
Interframe encoding refers to the process by which only those changes in a new frame (referred to herein as a "present frame") relative to a preceding frame, such as the immediately preceding frame, are transmitted or recorded. If there is virtually no change between frames, only minimal data is needed to describe the present frame. However, if there is little resemblance of the new frame to the preceding frame, as will be the case when the first frame of a new scene is present, then the amount of data which may be needed to describe the present frame is so large that it would be more efficient simply to rely upon intraframe encoding to represent the present frame. Thus, in a practical transmission or recording system, intraframe and interframe encoding are used in an adaptive manner to achieve optimum reduction or compression of the digital data needed to represent a video frame. The frame-to-frame changes which are transmitted or recorded in interframe encoding generally represent localized movement in the video picture of the preceding frame which results in the present frame and such changes are referred to as motion vectors. As is understood, the addition of motion vectors to the digital data which describes a preceding video frame results in the present frame. The addition of motion vectors to a preceding video frame also is known as motion compensation or motion prediction. That is, a present frame may be "predicted" simply by adding motion vectors to the data which describes the preceding frame.
Conventional motion compensation operates by detecting rectilinear motion of a present frame relative to a preceding frame. That is, changes in the vertical and horizontal directions of, for example, the blocks which constitute a video frame are detected and used to predict the corresponding blocks of the present frame. However, such rectilinear motion compensation assumes that the objects in a preceding frame may move only in x and y directions. On the contrary, it has been found that the objects in a preceding frame may undergo rotational movement from that frame to the present frame. The generation of rectilinear motion vectors, that is, rectilinear motion compensation, often does not provide an accurate or acceptable indication of such rotational movement. Hence, the use of rectilinear motion vectors to describe rotational movement may not be satisfactory.
Another drawback in limiting motion compensation to rectilinear motion vectors is the inability to account for common camera techniques which often are used, such as zoom-in and zoom-out. When a zooming factor that is positive or greater than unity is used, that is, when a cameraman zooms in on a subject, the effective size of a given block in the present frame appears to increase relative to that same block in the preceding frame. Conversely, when the zooming factor is negative or less than unity, that is, when the cameraman zooms out on a subject, the apparent size of a given block in the present frame relative to that same block in the preceding frame seems to decrease. Similarly, the movement of a subject toward or away from the camera gives rise to a zooming effect. Rectilinear motion compensation does not account for this zooming factor. Hence, conventional two-dimensional motion compensation often does not provide a satisfactory reconstruction of a video picture which was produced with relatively simple video camera techniques, such as zoom-in or zoom-out.