1. Field of the Invention
The present invention relates to motion estimation method and apparatus for calculating a motion vector, and more particularly to method and apparatus for calculating a motion vector to estimate a current picture partially forming a video sequence on the basis of a reference picture partially forming the video sequence.
2. Description of the Prior Art
In the recent years, information transmitting media such as news paper, TV and radio have been flooded with information relative to "multimedia" to which ardent attention is paid by all the world. Although variously interpreted, the term "multimedia" as used herein is considered to be information presented in the combination of text, graphics, video,, sound and the like. Since such information is generally handled by a computer, data representative of the video and sound as well as the text and graphics are required to be digitized. When data representative of a video sequence such as a motion picture are digitized, the amount of digitized data is extremely large in comparison with data indicative of sound, text or graphics. For this reason, the data of a motion picture to be handled by the computer are required to be compressed when the data is stored in a storage device or transmitted over a communication line.
Up until now, there have been proposed a wide variety of data compression processes for compressing the data of a motion picture in accordance with a correlation between two pictures (occasionally referred to as frames) partly forming the motion picture. Such a data compression process is applied to a so-called basic inter-frame predicting coding method and a so-called motion compensation inter-frame predicting coding method which will become apparent as this description proceeds.
First, the former basic inter-frame predicting coding method will be described hereinlater with reference to FIG. 43. This method comprises a step of calculating a difference between pel data of each pel (picture element) of a current picture 12 and pel data of each pel of a reference picture 11 corresponding in position to each other, the current picture 12 and the reference picture 11 partially forming a motion picture. The reference picture 11 may be either of feature and past pictures with respect to the current picture 12 upon condition that data indicative of the reference picture 11 have been already encoded. The method further comprises steps of comparing the difference with a predetermined threshold value, and dividing the pel data of the reference picture 11 into two data groups consisting of a significant pel data group having differences each larger than the threshold value and an insignificant pel data group having differences each equal to or less than the threshold value. The significant pel data are considered to be useful data that are not allowed to be omitted when the current picture 12 is estimated on the basis of the reference picture 11. On the contrary, the insignificant pel data are considered to be unnecessary data that are allowed to be omitted when the current picture 12 is estimated on the basis of the reference picture 11.
As shown in FIG. 43, if a person image 10 in the reference picture 11 has been moved right in the current picture 12, there are produced two significant pel data regions by reference numerals of 13 and 14, respectively and an insignificant pel data region indicated by a blank surrounding the significant pel data regions 13 and 14. By adding, to pel data of a pel of the reference picture 11 within the significant pel data regions 13 and 14, a difference between pel data of a pel of current picture 12 and the pel data of the pel of the reference picture 11 corresponding in position to each other, can be estimated the pel data of the pel of the current block picture 12. Pel data of each pel of the current picture 12 within the insignificant pel data region are represented by pel data of a pel of the reference picture 11 corresponding in position to the pel of the current picture 12.
In the case that the basic inter-frame predicting coding method is utilized, the difference data between two pels decrease fast as the significant pel data is fewer. This means that compression efficiency can be enhanced. The number of the significant pel is decreased by setting the threshold value large and as a consequence the compression efficiency can be further enhanced. If, however, the threshold value becomes extremely large, motion of the image looks to be jerky, or moving portion of the image looks to be at a standstill in part, thereby resulting in an drawback of the fact that image quality becomes poor.
In view of the property of the basic inter-frame predicting coding method, the compression efficiency is enhanced under the condition that variation between the current picture and the reference picture is small because of the fact that the difference data are decreased in proportion to the size of standstill image regions of the current picture with respect to the reference picture. The following motion compensation inter-frame predicting coding method, however, realizes higher compression efficiency in comparison with the basic inter-frame predicting coding method.
Likewise, on the assumption that the person image 10 in the reference picture is moved right in the current picture 12, the motion compensation inter-frame predicting coding method is explained hereinafter with reference to FIG. 44. The motion compensation inter-frame predicting coding method comprises a step of calculating a motion vector MV indicating the movement distance and movement direction of the person image 10 between the reference picture 11 and the current picture 12. The motion compensation inter-frame predicting coding method further comprises a step of estimating the person image 10 in the current picture 12 with the aid of the motion vector MV and pel data defining the person image 10 in the reference picture 11. In this case, there is produced only one significant pel data region 13 as shown in FIG. 44. Accordingly, the motion compensation inter-frame predicting coding method is superior to the basic inter-frame predicting coding method in the fact that the number of the significant pels can be sharply decreased and accordingly that the compression efficiency can be extremely enhanced.
The motion compensation inter-frame predicting coding method will be described hereinafter in detail with reference to FIGS. 45 to 47. According to ITU-T (International telecommunication Union-Telecommunication Standardization Sector) H.261, the motion compensation inter-frame predicting coding method comprises steps of dividing a current picture 20 shown in FIG. 45 into a plurality of blocks including a block (referred to hereinlater as a current block) 21, specifying a search window 31 including blocks (referred to hereinlater as candidate blocks) in a reference picture 30, and calculating distortion values each indicative a difference between the current block 21 and each of the candidate blocks. The distortion value is calculated by converting, into positive numbers, local distortion values each indicative of a difference between pel data of each pel of the current block 21 and pel data of each pel of the candidate block corresponding in position to each other, and summing up the converted local distortion values. The motion compensation inter-frame predicting coding method further comprises steps of specifying a candidate block 32 which provides a minimum distortion value, i.e. the smallest in the distortion values calculated in the above mentioned manner, and calculating a motion vector representative of a distance between and a direction defined by the current block 21 and the candidate block 32. By an encoder not shown, are encoded the motion vector MV thus calculated and the distortion value between the candidate block 32 included in the reference picture 30 and the current block 21.
FIGS. 46(a) and 46(b) represent relations between the current block 21, search window 31 and candidate blocks 32. If the current block 21 and the search window 31 contain N columns of M pels and L columns H pels as shown in FIGS. 46(b) and 46(a), respectively. The search window 31 includes (L-M+1).times.(H-N+1) candidate blocks 32 similar to the current blocks 21. In the case that pel data of a pel at the top left-hand corner of current block 21 in FIG. 46(b) is indicated by a(0,0), pel data of each of the candidate blocks 32 corresponding in position to the pel data a(0,0) of the current block 21 are included in an area defined by oblique lines in FIG. 46(a).
FIGS. 47(a) and 47(b) represent a relations between pel data of the current block 21 and pel data of each of the candidate block 32 corresponding in position to each other. By b(l+m, h+n) in FIG. 34(a), is indicated pel data of each of the candidate blocks corresponding in position to the pel data a(m, n)of the current block 21 shown in FIG. 47(b). Pel data b(l,h) in the search window 31 shown in FIG. 47(a) is pel data at the upper left-hand corner of the candidate block 32 and accordingly corresponds in position to the pel data a(0, 0) of the current block 21. In the current block 21, search window 31 and candidate block 32 shown in FIGS. 47(a) and 47(b), a distortion value between the current block 21 and the candidate block 32 is indicated by D(l,h) defined as follows: ##EQU1##
Note that ".vertline..vertline. .vertline..vertline." is a notation called the norm and that d(l,h)is defined by an equation, d(l,h)=b(m+l, n+h)-a(m,n), and represents a local distortion value indicative of a difference between pel data of two pels corresponding in position to each other. The norm arithmetic is absolute-value arithmetic, square arithmetic or the like. The above-mentioned process of comparing a block of the current picture with each of blocks of the reference picture in motion compensation inter-frame predicting coding method is so-called as a block matching method, particularly as a full search block matching method if the current block is compared with all the candidate blocks included in the search window.
There has been proposed MPEG (Moving Pictures Experts Group) including MPEG1 and MPEG2 which are supported by ISO/IEC (International Organization for Standardization/International Electrotechnical Commission) 11172-2 and 13818-2, respectively. In MPEG, the pels handled in the foregoing ITU-T H. 261 are referred as integer pels. In addition to the integer pels, the MPEG generates and handles half-pels each interposed between the adjoining integer pels. Four integer pels b(0,0), b(0,1), b(1,0) and b(1,1) are shown in FIG. 48 as forming a matrix array indicated by two columns of two pels. Between and based on the integer pels b(0,0) and b(1,0), the integer pels b(0,0) and b(0,1), and the integer pels b(0,0) and b(1,1), are generated half-pels Hh, Hv and Hd defined as follows:
Hh={b(0,0)+b(1,0)}/2 PA1 Hv={b(0,0)+b(0,1)}/2 PA1 Hd={b(0,0)+b(1,0)+b(0,1)+b(1,1)}/4
The generation of the half-pels makes it possible to search, from the search window, the candidate blocks horizontally, vertically and diagonally aligned at not one-pel pitches but half-pel pitches in the search window, so that the precision of the motion estimation can be enhanced more and more.
As shown in FIG. 49, the MPEG prepares the motion estimation a bi-directional prediction mode utilizing not only a forward prediction mode wherein the current picture is estimated with reference to the past picture, but also a backward prediction mode wherein the current picture is estimated with reference to the future picture. The implementation results of the forward and backward prediction modes are compared in prediction ability with each other. Between the forward prediction and the backward prediction is chosen one superior to the other. The implementation of the chosen mode is adopted, so that the prediction ability can be enhanced.
The MPEG also prepares the motion estimation a bi-directional interpolation prediction mode wherein the current picture is estimated with reference to an interpolated picture that is representative of a mean between the past picture and the future picture. The bi-directional interpolation prediction mode is used in order to further enhance the prediction ability in comparison with the bi-directional prediction mode.
The interpolated picture is formed depending upon a way which is described hereinafter with reference to FIGS. 50(a) and 50(b). In FIG. 50(a), reference numerals 50, 60 and 70 designate the current picture, a first reference picture previous to the current picture and a second reference picture subsequent to the current picture, respectively. By utilizing the aforementioned motion compensation inter-frame prediction technique, a reference block B included in the first reference picture 60 and most similar to a current block A included in the current picture 50 is specified, thereby obtaining a motion vector MV indicative of a displacement between the reference block B and the current block A. The motion vector MV is scaled with respect to the second reference picture 70 to obtain a scaled vector SV. In this instance, the motion vector MV is identical with the scaled vector SV because of the fact that a time-lag between the current picture 50 and the first reference picture 60 is equal to that between the current picture 50 and the second reference picture 70. Based on the scaled vector SV, a search window 72 partially forming the second reference picture 70 is specified. The search window 72 extends from a pel 71 included in the second reference picture 70 and most close to the end point of the scaled vector SV.
The search window 72 includes a plurality of reference blocks C1, C2 . . . Cn. In accordance with an expression indicated by (B+Cn)/2, interpolated blocks Dn (not shown) each indicative of a mean between the reference block 62 and each of the reference blocks Cn are calculated. The distortion values each indicative of a difference between each of the interpolated blocks Dn and the current block A are calculated by an expression indicated by .vertline..vertline.Dn-A.vertline..vertline.. This means that the number of the distortion values is "n". The minimum distortion value is selected from among the distortion values, and as a consequence a target interpolated block is specified from the interpolated blocks Dn.
The MPEG further prepares the motion estimation a dual-prime prediction mode partly described hereinafter. In FIG. 51, reference numerals 81 and 82 designate a first field and a second field, respectively, of a current picture, while reference numerals 91 and 92 designate a first field and a second filed, respectively, of a reference picture. The current picture has a current block which consists of a field block 81a included in the first field 81 and a field block 82b included in the second field 82. The dual-prime prediction mode comprises a step of applying the aforementioned motion compensation inter-frame prediction to an estimation of the field block 81a with reference to the first field 91 of the reference picture to calculate a field motion vector FV. The field motion vector FV is scaled with respect to the second field 92 of the reference picture to obtain a scaled vector SV1. Assuming that a half-pel of the second field 92 most close to the starting point of the scaled vector SV1 forms the uppermost half-pel 92P of a reference block included in the second field 92, a search window for the motion estimation is formed by nine overlapping field blocks surrounding the half-pel 92. Specifically, the nine field blocks consist of a center reference block including a half-pel 92P as an uppermost pel, and eight field blocks deviated vertically, horizontally and diagonally by half-pel pitch from the center reference block. The dual-prime prediction mode further comprises steps of calculating interpolated blocks each indicative of a mean between the reference block 91a of the first field 91 and each of the nine reference blocks included in the search window, and calculating distortion values each indicative of each interpolated block and the current block 81a. From among the calculated distortion values, the minimum distortion value is selected to obtain a differential motion vector representative of a displacement between the half-pel 92P and the uppermost pel of the reference block of the second field 92 which is used for the calculation of the minimum distortion value. In case that a reference block 92a is used for the calculation of the minimum distortion value, a vector DMV shown in FIG. 51 serves as the differential motion vector.
The field motion vector FV is shifted from the field block 81a to the field block 82b until the endpoint of the field motion vector FV is coincided with the uppermost pel of the field block 82b. The shifted field block FV is denoted by "FV'" in FIG. 51. By the starting point of the field motion vector FV' is specified a field block 92b of the second field 92. The field motion vector FV' is scaled with respect to the first field 91 of the reference picture to obtain a scaled vector SV2 so that a half-pel 91P of the first field 91 most closed to the starting point of the scaled vector SV2 is specified. The differential motion vector DMV is shifted from the half-pel 92P to the half-pel 91P, and consequently a field block 91b is specified. On the basis of the field block 91b and the field block 92b is calculated an interpolated block indicative of a mean between the field block 91b of the first field 91 and the field block 92b. Based on the interpolated block and the current field block 82b, a distortion value indicative of a difference between those blocks is calculated. In case of the dual-prime prediction mode, the field motion vector FV and the differential motion vector DMV serve as motion vectors for the motion estimation.
A drawback is encountered in the motion estimation according to a prior-art bi-directional interpolation prediction in that not only an apparatus or a circuit for performing the motion estimation becomes extremely large but also the motion vector calculation takes extremely long time. The reason is that it is required not only to calculate the distortion values the number of which is equal to that of the candidate blocks Cn included in the search window, but also to calculate the interpolated blocks the number of which is equal to that of the candidate blocks. In addition, the same drawback is encountered in the motion estimation according to a prior-art dual-prime prediction because of the fact that it is required to calculate the interpolated blocks the number of which is equal to that of candidate blocks.
The present invention contemplates provision of a motion estimation method and apparatus overcoming the drawbacks of prior-art motion estimation method and apparatus of the described general nature.