(1) Field of the Invention
The present invention relates to a motion estimation method that is used in a picture coding method and the like for efficiently compressing a moving picture signal by use of the correlation between pictures.
(2) Description of the Related Art
In the age of multimedia which integrally handles audio, video and other pixel values, existing information media, i.e., newspaper, magazine, television, radio, telephone and other means through which information is conveyed to people, have recently come to be included in the scope of multimedia. Generally, multimedia refers to something that is represented by associating not only characters, but also graphics, audio, and especially pictures and the like together. However, in order to include the aforementioned existing information media into the scope of multimedia, it appears as a prerequisite to represent such information in digital form.
However, when calculating the amount of information contained in each of the aforementioned information media as the amount of digital information, while the amount of information per character is 1 to 2 bytes in the case of characters, the amount of information to be required is 64 Kbits per second in the case of audio (telephone quality), and 100 Mbits per second in the case of moving pictures (current television reception quality). Therefore, it is not realistic for the aforementioned information media to handle such an enormous amount of information as it is in digital form. For example, although video phones are already in the actual use by using Integrated Services Digital Network (ISDN) which offers a transmission speed of 64 Kbits/s to 1.5 Mbits/s, it is not practical to transmit video of televisions and cameras directly through ISDN.
Against this backdrop, information compression techniques have become required, and moving picture compression techniques compliant with H.261 and H.263 standards recommended by ITU-T (International Telecommunication Union-Telecommunication Standardization Sector) are employed for video phones, for example. Moreover, according to information compression techniques compliant with the MPEG-1 standard, it is possible to store picture information into an ordinary music CD (compact disc) together with sound information.
Here, MPEG (Moving Picture Experts Group) is an international standard on compression of moving picture signals standardized by ISO/IEC (International Organization for Standardization, International Electrotechnical Commission), and MPEG-1 is a standard for compressing television signal information approximately into one hundredth so that moving picture signals can be transmitted at a rate of 1.5 Mbit/s. Furthermore, since a transmission speed achieved by the MPEG-1 standard is a middle-quality speed of about 1.5 Mbit/s, MPEG-2, which was standardized with a view to satisfying requirements for further improved picture quality, allows data transmission equivalent in quality to television broadcasting through which moving picture signals are transmitted at a rate of 2 to 15 Mbit/s. Moreover, MPEG-4 was standardized by the working group (ISO/IEC JTC1/SC29/WG11) which promoted the standardization of MPEG-1 and MPEG-2. MPEG-4, which provides a higher compression ratio than that of MPEG-1 and MPEG-2 and which enables an object-based coding/decoding/operation, is capable of providing a new functionality required in this age of multimedia. At the beginning stage of standardization, MPEG-4 aimed at providing a low bit rate coding method, but it has been extended as a standard supporting more general coding that handles interlaced images as well as high bit rate coding. Currently, an effort has been made jointly by ISO/IEC and ITU-T for standardizing MPEG-4 AVC and ITU-T H.264 as picture coding methods of the next generation that offer a higher compression ratio. As of August 2002, a committee draft (CD) is issued for a picture coding method of the next generation.
In general, in coding of a moving picture, the amount of information is compressed by reducing redundancies in temporal and spatial directions. Therefore, in inter picture prediction coding aiming at reducing temporal redundancies, motion estimation and the generation of a predictive image are carried out on a block-by-block basis with reference to forward or backward picture(s), and coding is then performed on the difference value between the obtained predictive image and an image in the current picture to be coded. Here, “picture” is a term denoting one image.
A picture to be coded using intra picture prediction without generating any predictive images shall be referred to as an I picture. A picture to be coded using inter picture prediction with reference to only one picture shall be referred to as a P picture. And, a picture to be coded using inter picture prediction with reference to two pictures at the same time shall be referred to as a B picture. It is possible for a B picture to refer to two pictures which can be arbitrarily combined from forward/backward pictures in display order.
P pictures and B pictures are coded using motion compensated inter picture prediction. Coding by use of motion compensated inter picture prediction is a coding method that employs motion compensation in inter picture prediction coding. Unlike a method for performing prediction simply based on pixel values in a reference picture, motion estimation is a technique capable of improving prediction accuracy as well as reducing the amount of data by estimating the amount of motion (hereinafter referred to as “motion vector”) of each part within a picture and further by performing prediction in consideration of such amount of motion. For example, it is possible to reduce the amount of data through motion compensation by estimating motion vectors of the current picture to be coded (hereinafter also referred to simply as “current picture”) and then by coding prediction residuals between prediction values obtained by making a shift by the amount equivalent to the respective motion vectors and the current picture. In this technique, motion vectors are also recorded or transmitted in coded form, since motion vector information is required at the time of decoding.
Motion vectors are estimated on a per-block basis. More specifically, a block shall be previously fixed in the current picture, so as to estimate motion vectors by finding the position of the most similar reference block of such fixed block within the search area in a reference picture. This processing for searching for motion vectors is called “motion estimation”.
FIG. 1 is a schematic diagram for illustrating motion estimation. The position of a block in a reference picture that is most similar to the current block is estimated by comparing the current block with arbitrary blocks in the reference picture. In general, the judgment of whether the current block and a reference block are similar or not is made based on comparative error between them, especially summed absolute difference (SAD). If all the reference blocks in a reference picture are searched, an enormous amount of computation is required. Thus, the scope of search is limited to a certain area within each reference picture. Such limited area is referred to as a search area.
Motion estimation is known as a process that requires the largest amount of computation of all the processes of moving picture coding. For this reason, there have been proposed many methods for reducing the amount of computation required for motion estimation. As one of such methods, sequential search, which requires a small amount of computation, is often used for devices such as mobile terminal with low computation power (e.g. refer to Japanese Laid-Open Patent application No. 2000-333184).
FIG. 2 is a schematic diagram showing how reference blocks in which a search is to be performed and a reference block serving as a search center shift according to a conventional motion estimation method. In this drawing, each circle represents the position of the pixel located at the upper left in each reference block. The position of each reference block is indicated by such circle in a simplified manner. Also in the drawing, each number represents the position of each reference block for which a SAD should be calculated along with the shift of a reference block serving as a search center in an example described below.
(1) First, a SAD is computed for a reference block Ba that is located at a search start position X, as well as for each of the reference blocks Bb, Bc, Bd, and Be that are located at positions to be reached by making a one-pixel shift to each of the up, down, left, and right directions from the reference block Ba (such positions are hereinafter referred to simply as up position, down position, left position, and right position, respectively, or collectively as up/down/left/right positions). When the SAD of the reference block Bc located at the right position is the smallest of all the SADs of the reference blocks Ba to Be, for example, the search center is shifted to the right. (2) Next, a SAD is determined for each of the reference blocks located at the up/down/left/right positions with respect to the reference block serving as the new search center (here, the reference block Bc), and the search center is shifted to the right if the SAD of the reference block located at the right position is the smallest. (3) Next, a SAD is determined for each of the reference blocks located at the up/down/left/right positions with respect to the reference block serving as the new search center, and the search center is shifted to the right if the SAD of the reference block located at the right position is the smallest. (4) Next, a SAD is determined for each of the reference blocks located at the up/down/left/right positions with respect to the reference block serving as the new search center, and the search center is shifted downward if the SAD of the reference block located at the down position is the smallest. (5) Next, a SAD is determined for each of the reference blocks located at the up/down/left/right positions with respect to the reference block serving as the new search center, and the search center is shifted downward if the SAD of the reference block located at the down position is the smallest. (6) Next, a SAD is determined for each of the reference blocks located at the up/down/left/right positions with respect to the reference block serving as the new search center, and the search center is shifted to the right if the SAD of the reference block located at the right position is the smallest. (7) Next, a SAD is determined for each of the reference blocks located at the up/down/left/right positions with respect to the reference block serving as the new search center, and the search center is shifted to the right if the SAD of the reference block located at the right position is the smallest. (8) Next, a SAD is determined for each of the reference blocks located at the up/down/left/right positions with respect to the reference block serving as the new search center, and the search center is shifted downward if the SAD of the reference block located at the down position is the smallest. (9) Next, a SAD is determined for each of the reference blocks located at the up/down/left/right positions with respect to the reference block serving as the new search center, and the search is terminated if the SAD of the reference block serving as the current search center is the smallest. Accordingly, the position Y of such reference block serving as the current search center is regarded as the position with the smallest SAD, i.e., an optimum motion vector.
However, since sequential search as described above has a problem that motion estimation cannot be performed at high speed, because the search center shifts only by one pixel per search and therefore the reference block serving as a search center is required to be shifted to another reference block for many times for performing motion estimation on pictures with large motion.