1. Field of the Invention
The present invention relates generally to motion compensation, and in particular, to methods and apparatus for the detection of motion vectors.
2. Description of the Related Art
Advances in audio and video compression and decompression techniques, together with very large scale integration technology, have enabled the creation of new capabilities and markets. These include the storage of digital audio and video in computers and on small optical discs as well as the transmission of digital audio and video signals from direct broadcast satellites.
Such advances were made possible, in part, by international standards which provide compatibility between different approaches to compression and decompression. One such standard is known as "JPEG," for Joint Photographic Expert Group. A later developed standard is known as "MPEG1." This was the first set of standards agreed to by the Moving Pictures Expert Group. Yet another standard is known as "ITU-T H.261", which is a video compression standard particularly useful for video teleconferencing. Although each standard is designed for a specific application, all of the standards have much in common.
MPEG1 was designed for the storage and distribution of audio and motion video, with emphasis on video quality. Its features include random access, fast forward and reverse playback. MPEG1 serves as the basis for video compact disks and for many video games. The original channel bandwidth and image resolution for MPEG1 were established based upon the recording media then available. The goal of MPEG1 was the reproduction of recorded digital audio and video using a 12 centimeter diameter optical disc with a bit rate of 1.416 Mbps, 1.15 Mbps of which are allocated to video.
The compressed bit streams generated under the MPEG1 standard implicitly define the decompression algorithms to be used for such bit streams. The compression algorithms, however, can vary within the specifications of the MPEG1 standard, thereby allowing the possibility of a proprietary advantage in regard to the generation of compressed bit streams.
A later developed standard known as "MPEG2" extends the basic concepts of MPEG1 to cover a wider range of applications. Although the primary application of the MPEG2 standard is the all digital transmission of broadcast-quality video at bit rates of 4 Mbps to 9 Mbps, it appears that the MPEG2 standard may also be useful for other applications, such as the storage of full length motion pictures on Digital Video Disk ("DVD") optical discs, with resolution at least as good as that presently provided by 12 inch diameter laser discs. DVD is now sometimes referred to as "Digital Versatile Disk" since it is intended that such disks supplement and/or replace current CD-ROMs since their data storage capacity is much greater.
The MPEG2 standard relies upon three types of coded pictures. I ("intra") pictures are fields or frames coded as a stand-alone still image. Such I pictures allow random access points within a video stream. As such, I pictures should occur about two times per second. I pictures should also be used where scene cuts (such as in a motion picture) occur.
P ("predicted") pictures are fields or frames coded relative to the nearest previous I or P picture, resulting in forward prediction processing. P pictures allow more compression than I pictures through the use of motion compensation, and also serve as a reference for B pictures and future P pictures.
B ("bidirectional") pictures are fields or frames that use the most closest (with respect to display order) past and future I or P picture as a reference, resulting in bidirectional prediction. B pictures provide the most compression and increase signal to noise ratio by averaging two pictures. Such I, P and B pictures are more thoroughly described in U.S. Pat. Nos. 5,386,234 and 5,481,553, both of which are assigned to Sony Corporation, and said U.S. Patents are incorporated herein by reference.
A group of pictures ("GOP") is a series of one or more coded pictures which assist in random accessing and editing. A GOP value is configurable during the encoding process. Since the I pictures are closer together, the smaller the GOP value, the better the response to movement. The level of compression is, however, lower.
In a coded bitstream, a GOP must start with an I picture and may be followed by any number of 1, P or B pictures in any order. In display order, a GOP must start with an I or B picture and end with an I or P picture. Thus, the smallest GOP size is a single I picture, with the largest size of the GOP being unlimited.
Both MPEG1 and MPEG2 rely upon a video compression coding method utilizes discrete cosine transformation, motion estimation and variable length coding.
In further detail, in the picture coding apparatus of FIG. 13, picture data is supplied to an input terminal T1. This picture data is inputted to motion vector detection circuit 21 and to subtraction circuit 22. Motion vector detection circuit 21 detects or finds a motion vector based upon the reference frame and the search frame using the inputted picture data. Motion vector detection circuit 21 then provides the motion vector to motion compensation circuit 23. As explained further below, picture data of the reference frame is stored in a frame memory 24, and such stored picture data is supplied to motion compensation circuit 23. Such stored picture data changes as the content of the reference frame changes.
Motion compensation circuit 23 compensates for motion from frame to frame based upon the picture data from frame memory 24 and the motion vector provided by motion vector detection circuit 21. The output of motion compensation circuit 23 is provided to subtraction circuit 22 and to an addition circuit 25. Subtraction circuit 22 generates error data based upon the difference between the inputted data and the motion compensated data output from motion compensation circuit 23. This error data is provided to a discrete cosine transform ("DCT") circuit 26, which orthogonally transforms such error data. The orthogonally transformed error data is then provided to a quantization circuit 27. Quantization circuit 27 quantizes such orthogonally transformed error data. The output of quantization circuit 27 is then provided to variable length coding circuit 28. Variable length coding circuit 28 variable length codes the output of quantization circuit 27 provides such coded data to an output terminal T2.
The output of quantization circuit 27 is also provided to inverse quantization circuit 29. Inverse quantization circuit 29 inverse quantizes the output of quantization circuit 27. The output of inverse quantization circuit 29 is then provided to an inverse discrete cosine transform ("IDCT") circuit 30, which inverse orthogonally transforms the output of inverse quantization circuit 29. The result is that the original error data output from subtraction circuit 22 is essentially regenerated or estimated and is then provided to addition circuit 25. Addition circuit 25 adds this estimated error data to the output data of motion compensation circuit 23. As stated above, motion compensation circuit 23 compensates for motion from frame to frame based upon the picture data which from frame memory 24 and the motion vector provided by motion vector detection circuit 21. The picture data thus stored in frame memory 24 is picture data of the following reference frame.
A known method of detecting a motion vector in the picture coding apparatus of FIG. 13 is known as a block matching method The block matching method divides a frame into the small rectangular area ("block") and detects a motion for every block. Such a block, for example, consists of 8 picture elements horizontally by 8 picture elements vertically, for a total of 64 picture elements within such block. Such picture elements are also often referred to as "pixels" or "pels."
With reference now to FIG. 14, the block matching method is explained. The block matching method establishes a reference block RB in reference frame. The block matching method also establishes a search block SBO within a search frame 42, where the size of search block SBO, M.times.N, is the same as that of reference block RB in reference 41. Search block SBO is moved within a fixed search area 43 in order to find the highest degree of correlation between the search block SB and the reference block RB. Once this highest degree of correlation is found (at search block SBk, for example), a motion vector (u,v) is determined based upon the amount of horizontal and/or vertical displacement from the search block SBO in reference frame 41 to search block SBk. Thus, when search block SBO is shifted by motion vector (u,v), the degree of correlation between reference block RB and search block SBk is basically highest. Stated differently, the correlation is highest in the position where search block SBO is shifted by the motion vector (u,v), where search block SBO is initially in the position which is the same as reference block RB. The block matching method bases the degree of correlation upon the difference in absolute values of each pixel.
With respect to MPEG, the block matching method is applied to each frame of group of pictures. With reference now to FIG. 15, I0 is an I picture which serves as a reference frame. P3 is P picture which is a reference frame which includes motion with respect to the I picture. Picture B1 which is B picture which can function as a search frame, by detection of motion in both the direction from picture I0 and the direction from picture P3. Similarly, picture B2 is B picture which can function as a search, by detection of motion in both the directions from picture I0 and the direction from picture P3.
With reference now to FIGS. 16A, 16B and 16C, generally, for the block matching method, it is desirable for the size of the search area to be proportional to the interval between the reference frame and the search frame. This is due to the fact a greater amount of motion is more likely to occur if the interval between frames is greater. For example, when the search area for an interval of one frame is 32.times.32 pixels (.+-.16 horizontally by .+-.16 vertically) as shown in FIG. 16A, if the interval is two frames, it is desirable that the search area be 64.times.64 (.+-.32 horizontally by .+-.32 vertically) pixels in size as shown in FIG. 16B. Similarly, when the interval is three frames, it is desirable that the search area be 96.times.96.+-.48 horizontally by .+-.48 vertically pixels in size as shown in FIG. 16C. However, when the search area is selected to be proportional to the frame interval in this way, the size of the search area is increased by 4 times in the case of a two frame interval, and is increased 9 times in the case of a three frame interval, when compared with the size of the search area for an interval of one frame. In other words, because of the size of the search area, in order to detect a motion vector for the frame P3 from the I0 reference frame, a large quantity of data must be processed to determine the best correlation.
A method referred to as "telescopic search" is known as a way of expanding a search area without increasing the amount of data to be transferred and/or processed. With the telescopic search, the range which is searched is one which essentially covers a large search area by always using a 32.times.32 (.+-.16 horizontally by .+-.16 vertically) pixel search block, where the center of the search block is successively offset for each successive frame from the reference frame. For example, as shown in FIG. 17, when searching at the search frame 1 (which is 1 frame to the right of the reference frame), the search area has an area of 32.times.32 pixels. Utilizing the reference block of the reference frame, the telescopic search looks for motion vector MV1 with respect to the reference block. Then, a search area within search frame 2 (which is 2 frames to the right of the reference frame) is established, such that the search area is centered at the coordinates (mvx1, mvy1) of the endpoint of motion vector MV1. The telescopic search then looks for motion vector MV2 within such search area. As shown in FIG. 17, the search area within each of search frames 1, 2 and 3 have an area of 32.times.32 pixels. Thus, the search area within search frame 2, from the perspective of the reference block, is effectively 64.times.64 pixels.
Next, a search area within search frame 3 (which is 3 frames to the right of the reference frame) is established such that the search area is centered at the coordinates (mvx2, mvy2) of the endpoint of motion vector MV2. The telescopic search then looks for motion vector MV3 within such search area (which has an area of 32.times.32 pixels). The search area within the search frame 3, from the perspective of the reference block, is effectively 96.times.96 pixels. Thus, by utilizing the telescopic search method, required data transfer rate basically that for a search area of 32.times.32 pixels.
FIGS. 18A and 18B provide a visual comparison of the amount of search data generated by a normal search (i.e., when the search area is proportional to the interval between the reference frame and the search frame) and the amount of data generated by a telescopic search. When establishing a search area in the case of the normal search, because the search area overlaps (indicated by the cross hatching), only that data of the 32.times.32 picture element reference block which does not newly overlap a horizontally neighboring reference block is transferred. Such a configuration is illustrated by search block RB0 and search block RB1 in FIG. 18A.
Referring now to FIG. 18B, in case of telescopic search, because the search area is different for every search block (i.e., RB0 and RB1), data regarding the search area for every reference block must be forwarded for every set of clock cycles, such as for 256 clock cycles. In other words, if the search area of the telescopic search is 32.times.32 pixels, data must be forwarded at a rate of 3 times of the search data. When the search area increases in area, e.g., in case of 64.times.64 pixels, the data must be forwarded at a rate of 5 times of the search data. Thus, if a picture element is represented by 8 bits, and the picture element clock has a frequency of 13.5 MHz, a data transfer rate of 337.5 Mbytes/second ((80.times.80/256).times.13.5 MHz.times.1 byte=337.5 Mbyte/sec) is necessary. Such a data transfer rate requires expensive hardware.
Additionally, with the telescopic search, if it is determined that the coordinates of the endpoint of the motion vector has its endpoint in the upper right corner of the 96.times.96 picture element search area, the telescopic search sets the search area to have an area of 32.times.32 pixels. However, when the horizontal component of the motion vector is relatively small, such horizontal component cannot be detected. In such an instance, picture quality degrades significantly because of the change in the rate necessary to transfer reference data.
Accordingly, it would desirable to provide a method and apparatus for detecting motion vectors which method and apparatus overcome the problems inherent in the block matching method and the telescoping search method.