1. Field of the Invention
The present invention relates to a scheme for identifying a gray-scale image. In particular, the present invention relates to a technique of simultaneously improving noise tolerance and distortion tolerance in gray-scale-image identification and recognition that are essential for image pattern recognition, motion analysis, and stereo vision.
2. Description of the Background Art
Distortion tolerance and noise tolerance are serious problems to be solved for gray-scale-image identification techniques.
The techniques to improve distortion tolerance fall into three approaches. They are (1) combinational search, (2) energy minimization, and (3) affine parameter determination.
The first approach, i.e., the combinational search binarizes an input gray-scale image into an input black-point set and then matches the input black-point set and target black-point set. This first approach finds an optimal solution among black-point combinations whose number is of the factorial of the number of points contained in the input black-point set, so that this approach diverges the number of processes to obtain an optimal solution.
A technique of restricting the number of candidate solutions by setting constraints has been studied to prune the branches of a decision search tree to limit the number of processes for an optimal solution. This is disclosed in, for example, H. S. Baird, xe2x80x9cModel-Based Image Matching Using Location,xe2x80x9d Cambridge, Mass.: MIT Press, 1985. Under the constraints, solution algorithm has been proposed for a problem of determining whether or not two point-sets match with each other through congruent transformation (rotation and translation) and a problem of determining whether or not two point-sets match with each other through similar transformation (rotation, scale change, and translation). The number of processes involved in these algorithms is of the order of power of the number of points contained in a point-set. This algorithm is described in, for example, S. Umeyama, xe2x80x9cParametrized point pattern matching and its application to recognition of object families,xe2x80x9d IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 15, No. 2, pp. 135-144, 1993.
It is difficult, however, to find general constraints for the above algorithms, and the above algorithms still involve a large number of processes and provide no solution for affine transformation (e.g., rotation, scale change, shearing, and translation) that includes shearing in addition to similar transformation.
On the other hand, the constraints cause local contradiction, and to resolve the local contradiction, discrete relaxation has been proposed. The discrete relaxation method employs interpoint matching coefficients to successively update matching states and converge into a consistent solution, as disclosed in, for example, A. Rosenfeld, R. A. Hummel, and S. W. Zucker, xe2x80x9cScene labeling by relaxation operations,xe2x80x9d IEEE Trans., Vol. SMC-6, No. 6, pp. 420-433, 1976. The discrete relaxation, however, provides no guidance for rules for updating matching states or a way of setting matching coefficients, involves many processes due to iterations, and guarantees no convergence.
Moreover, these techniques are based on the binarization of a gray-scale image. If the image involves noise, degradation, or background texture, the binarization of the image will fail. Therefore, it is impossible for these techniques to achieve distortion tolerance from the beginning.
The second approach, i.e., the energy minimization is based on dynamic analogy. This approach formulates an image identification problem as an optimization problem based on the energy minimization principle. One effective technique based on this approach introduces image identification constraints into energy functions based on the regularization theory, as disclosed in, for example, T. Poggio, V. Torre, and C. Koch, xe2x80x9cComputational vision and regularization theory,xe2x80x9d Nature, Vol. 317, No. 6035, pp. 314-319, 1985.
Solutions for the energy minimization problem based on a calculus of variations, stochastic relaxation, etc., are disclosed in, for example, B. K. P. Horn and B. G. Schunck, xe2x80x9cDetermining optical flow,xe2x80x9d Artificial Intelligence, Vol. 17, pp. 185-203, 1981; M. Kass, A. Witkin, and D. Terzopoulos, xe2x80x9cSnakes: active contour models,xe2x80x9d Int. Journal of Computer Vision, Vol. 1, No. 4, pp. 321-331, 1988; and S. Geman and D. Geman, xe2x80x9cStochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,xe2x80x9d IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 6, No. 6, pp. 721-741, 1984.
These are advantageous in analytically or algebraically handling matching problems. They, however, find local optimal solutions from continuous translations based on iterated infinitesimal translations. Accordingly, it is difficult for them to deal with finite or discontinuous translations, or guarantee a convergence to a global optimal solution. In addition, they involve a large number of processes.
The third approach, i.e., the affine parameter determination binarizes an input gray-scale image into an input black-point set and matches it and a target black-point set. This approach directly finds affine parameters that maximize the matching of the input and target images from the iterated solutions of simultaneous linear equations. To evaluate the matching of two images, one technique checks to see if an average of the distances between the proximal black points of the two images has been minimized, as disclosed in T. Wakahara and K. Odaka, xe2x80x9cAdaptive normalization of handwritten characters using global/local affine transformation,xe2x80x9d IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 20, No. 12, pp. 1332-1341, 1998. Another technique to evaluate the matching of two images checks to see if a likelihood between the two images has been maximized on an assumption that the positions of black points vary according to a normal distribution, as disclosed in Japanese Patent Application No. Hei10-255042 (1998) xe2x80x9cPoint Pattern Normalization Method and Apparatus.xe2x80x9d This affine parameter determination is a promising image identification approach in which image can be identified with respect to arbitrary affine parameter. This approach, however, is based on binarization like the above combinational search approach. Accordingly, if an image involves superimposed noise, degradation, or background texture, the binarization itself will fail. Then, it is impossible for this approach to obtain distortion tolerance as such.
On the other hand, to improve noise tolerance, there is a technique of employing normalized cross-correlation as a matching measure for gray-scale images, as disclosed in, for example, A. Rosenfeld and A. C. Kak, Digital Picture Processing, Second edition, San Diego, Calif.: Academic Press, 1982, Chap. 9. It has theoretically been verified that the normalized cross-correlation has a tolerance for a blurring operation on images, as described in, for example, T. Iijima, xe2x80x9cPattern Recognition,xe2x80x9d Tokyo: Corona, 1973, Chap.6. The normalized cross-correlation is effective to identify an image that involves superimposed noise, degradation, or background texture, as described in, for example, M. Uenohara and T. Kanade, xe2x80x9cUse of Fourier and Karhunen-Loeve decomposition for fast pattern matching with a large set of templates,xe2x80x9d IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 19, No. 8, pp. 891-898, 1997; and M. Sawaki and N. Hagita, xe2x80x9cRecognition of degraded machine-printed characters using a complementary similarity measure and error-correction learning,xe2x80x9d IEICE Trans. Information and Systems, Vol. E79-D, No. 5, pp. 491-497, 1996. An image identification operation based on the normalized cross-correlation may handle a congruent transformation (e.g., rotation or translation) of an image by thoroughly scanning using templates. This technique, however, has an intrinsic problem of deteriorating correlation values when an affine transformation involving scale change and shearing is applied to an image. In addition, it is practically impossible to thoroughly cover templates to cope with scale change and shearing because the number of processes diverges. Consequently, this normalized cross-correlation approach cannot realize distortion tolerance.
In this way, various techniques for directly recognizing and identifying gray-scale images have tried to improve their distortion tolerance and noise tolerance. In terms of improving distortion tolerance, there are (1) the combinational search carried out between binarized black-point sets, (2) the energy minimization to iterate infinitesimal translations to match gray-scale images with each other, and (3) the affine parameter determination to employ an iterate solution to directly determine affine parameters that maximize the matched area of binarized black-point sets. In terms of improving noise tolerance, there is the normalized cross-correlation.
However, in terms of improving distortion tolerance, there is no technique that is capable of handling a wide range of finite translations and distortions that are not infinitesimal, with a practical number of processes. In terms of improving noise tolerance, the normalized cross-correlation may be effective. This, however, considerably deteriorates correlation values when an affine transformation involving scale change and shearing is applied to images. In other words, there is no technique that simultaneously improves distortion tolerance and noise tolerance. If an input image to be processed involves noise, degradation, or background texture, a binarization operation, which is imperative for these conventional techniques, on the input image will fail. At the same time, the input image will lose gray-scale gradient information that is useful for image matching. It is required, therefore, to provide an accurate image identification technique that directly handles gray-scale images without binarization.
The present invention has been made to solve the above-mentioned problem of the conventional technique.
It is an object of the present invention is to provide a technique of identifying an input gray-scale image by directly handling the input image while realizing distortion tolerance and noise tolerance. The technique applies an optimal affine transformation (rotation, scale change, shearing, and translation) to the input image in such a way as to maximize normalized cross-correlation between an affine-transformation-superimposed input gray-scale image and a target gray-scale image. The technique identifies the input image with a maximal cross-correlation value. Affine parameters used for the affine transformation are determined by iteratively solving simultaneous linear equations through a practical number of processes. These simultaneous linear equations are derived from maximized weighted normalized cross-correlation that employs gray-scale gradient information to enhance image matching. By maximizing normalized cross-correlation between an affine-transformation-superimposed input gray-scale image and the target image, the technique accurately identifies the input image. Consequently, this technique covers a wide range of translations and distortions represented with arbitrary affine transformations and realizes noise tolerance.
In order to accomplish the object, an aspect of the present invention provides a method of matching input gray-scale image data F with target image data G, the data F being composed of a set of gray levels representative of points that form an image. The method comprises the steps of (a) calculating weighting coefficient based on interpoint distances between the point of the input gray-scale image data F and the point of the target image data G and inner products of gray-scale gradients at the points of the input gray-scale image data F and target image data G; (b) determining affine parameters for the input gray-scale image data F based on the calculated weighting coefficient; (c) applying an affine transformation to the input gray-scale image data F based on the determined affine parameters to shape the input gray-scale image data F into affine-transformation-superimposed input gray-scale image data F*; (d) calculating a normalized cross-correlation value between the affine-transformation-superimposed input gray-scale image data F* and the target image data G; and (e) providing, as a matching result for the target image data G, at least one of affine-transformation-superimposed input gray-scale image data F* that provides a correlation value and the correlation value itself.
Another aspect of the present invention provides a method of matching input gray-scale image data F and target image data G, the data F being composed of a set of gray levels representative of points that form an image. The method comprises the steps of (aa) calculating Gaussian kernel interpoint weighting coefficient based on each interpoint distance ∥rxe2x88x92rxe2x80x2∥ between a point r in the input gray-scale image data F and a point rxe2x80x2 in the target image data G and an inner product ∇f (r)xc2x7∇g(rxe2x80x2) of gray-scale gradients for a gray level f(r) at the point r and a gray level g(rxe2x80x2) at the point rxe2x80x2; (bb) determining affine parameter for the input gray-scale image data F based on the calculated Gaussian kernel interpoint weighting coefficient in such a way as to maximize a weighted normalized cross-correlation; (cc) applying an affine transformation to the input gray-scale image data F based on the determined affine parameter to shape the input gray-scale image data F into affine-transformation-superimposed input gray-scale image data F*; (dd) calculating a normalized cross-correlation value C1 between the affine-transformation-superimposed input gray-scale image data F* and the target image data G as well as a normalized cross-correlation value C0 between the input gray-scale image data F and the target image data G; and (ee) comparing the values C1 and C0 with each other, and if C1 greater than C0, substituting the transformed data F* for the input gray-scale image data F and repeating the steps (aa) to (dd), and if not C1 greater than C0, providing at least one of the value C0 and the affine-transformation-superimposed input gray-scale image data F* corresponding to the value C0 as a matching result for the target image data G.
Still another aspect of the present invention provides a method of retrieving desired image data that includes target image data G from stored gray-scale image data by matching each data piece (F) inputted from the stored image data and the target image data G, the data F being composed of a set of gray levels representative of points that form an image. The method comprises the steps of (aaa) calculating weighting coefficient based on interpoint distances between the point of the input gray-scale image data F and the point of the target image data G and inner products of gray-scale gradients at the points of the input gray-scale image data F and target image data G; (bbb) determining affine parameter for the target image data G based on the calculated weighting coefficient; (ccc) applying an affine transformation to the target image data G based on the determined affine parameter to shape the target image data G into affine-transformation-superimposed target gray-scale image data G*; (ddd) calculating a maximal normalized cross-correlation value between the affine-transformation-superimposed target gray-scale image data G* and the input gray-scale image data F; and (eee) providing at least one of the input gray-scale image data F with which the maximal normalized cross-correlation value exceeds prescribed threshold and the maximal normalized cross-correlation value itself as a retrieval result of a gray-scale image data containing the target image data G.
Still another aspect of the present invention provides an apparatus for matching input gray-scale image data F and target image data G, the data F being composed of a set of gray levels representative of points that form an image. The apparatus comprises (a) a unit for calculating weighting coefficient based on interpoint distances between the point of the input gray-scale image data F and the point of the target image data G and inner products of gray-scale gradients at the points of the input gray-scale image data F and target image data G; (b) a unit for determining affine parameters for the input gray-scale image data F based on the calculated weighting coefficient; (c) a unit for applying an affine transformation to the input gray-scale image data F based on the determined affine parameters to shape the input gray-scale image data F into affine-transformation-superimposed input gray-scale image data F*; (d) a unit for calculating a normalized cross-correlation value between the affine-transformation-superimposed input gray-scale image data F* and the target image data G; and (e) a unit for providing, as a matching result for the target image data G, at least one of affine-transformation-superimposed input gray-scale image data F* that provides a correlation value and the correlation value itself.
Still another aspect of the present invention provides an apparatus for matching input gray-scale image data F and target image data G, the data F being composed of a set of gray levels representative of points that form an image. The apparatus comprises (aa) a unit for calculating Gaussian kernel interpoint weighting coefficient based on each interpoint distance ∥rxe2x88x92rxe2x80x2∥ between a point r in the input gray-scale image data F and a point rxe2x80x2 in the target image data G and an inner product ∇f(r)xc2x7∇g(rxe2x80x2) of gray-scale gradients for a gray level f(r) at the point r and a gray level g(rxe2x80x2) at the point rxe2x80x2; (bb) a unit for determining affine parameter for the input gray-scale image data F based on the calculated Gaussian kernel interpoint weighting coefficient in such a way as to maximize a weighted normalized cross-correlation; (cc) a unit for applying an affine transformation to the input gray-scale image data F based on the determined affine parameter to shape the input gray-scale image data F into affine-transformation-superimposed input gray-scale image data F*; (dd) a unit for calculating a normalized cross-correlation value C, between the affine-transformation-superimposed input gray-scale image data F* and the target image data G as well as a normalized cross-correlation value C0 between the input gray-scale image data F and the target image data G; and (ee) a unit for comparing the values C1 and C0 with each other, and if C1 greater than C0, substituting the affine-transformation-superimposed input gray-scale image data F* for the input gray-scale image data F and repeating the operations carried out by the units (aa) to (dd), and if not C1 greater than C0, providing at least one of the value C0 and the affine-transformation-superimposed input gray-scale image data F* corresponding to the value C0 as a matching result for the target image data G.
Still another aspect of the present invention provides an apparatus for retrieving desied image data that includes target image data G from stored gray-scale image data by matching each data piece (F) inputted from the stored image data and the target image data G, the data F being composed of a set of gray levels representative of points that form an image. The apparatus comprises (aaa) a unit for calculating weighting coefficient based on interpoint distances between the point of the input gray-scale image data F and the point of the target image data G and inner products of gray-scale gradients at the points of the input gray-scale image data F and target image data G; (bbb) a unit for determining affine parameter for the target image data G based on the calculated weighting coefficient; (ccc) a unit for applying an affine transformation to the target image data G based on the determined affine parameter to shape the target image data G into affine-transformation-superimposed target gray-scale image data G*; (ddd) a unit for calculating a maximal normalized cross-correlation value between the affine-transformation-superimposed target gray-scale image data G* and the input gray-scale image data F; and (eee) a unit for providing at least one of the input gray-scale image data F with which the maximal normalized cross-correlation value exceeds prescribed threshold and the maximal normalized cross-correlation value itself as a retrieval result of a gray-scale image data containing the target image data G.
Still another aspect of the present invention provides a computer readable recording medium recording a program for causing the computer to execute processing for matching input gray-scale image data F and target image data G, the data F being composed of a set of gray levels representative of points that form an image. The processing includes (a) a process for calculating weighting coefficient based on interpoint distances between the point of the input gray-scale image data F and the point of the target image data G and inner products of gray-scale gradients at the points of the input gray-scale image data F and target image data G; (b) a process for determining affine parameter for the input gray-scale image data F based on the calculated weighting coefficient; (c) a process for applying an affine transformation to the input gray-scale image data F based on the determined affine parameters to shape the input gray-scale image data F into affine-transformation-superimposed input gray-scale image data F*; (d) a process for calculating a normalized cross-correlation value between the affine-transformation-superimposed input gray-scale image data F* and the target image data G; and (e) a process for providing, as a matching result for the target image data G, at least one of affine-transformation-superimposed input gray-scale image data F* that provides a correlation value and the correlation value itself.
Still another aspect of the present invention provides a computer readable recording medium recording a program for causing the computer to execute processing for matching input gray-scale image data F with target image data G, the data F being composed of a set of gray levels representative of points that form an image. The processing includes (aa) a process for calculating Gaussian kernel interpoint weighting coefficient based on each interpoint distance ∥rxe2x88x92rxe2x80x2∥ between a point r in the input gray-scale image data F and a point rxe2x80x2 in the target image data G and an inner product ∇f(r)xc2x7∇g(rxe2x80x2) of gray-scale gradients for a gray level f(r) at the point r and a gray level g(rxe2x80x2) at the point rxe2x80x2; (bb) a process for determining affine parameter for the input gray-scale image data F based on the calculated Gaussian kernel interpoint weighting coefficient in such a way as to maximize a weighted normalized cross-correlation; (cc) a process for applying an affine transformation to the input gray-scale data F based on the determined affine parameter to shape the input gray-scale image data F into affine-transformation-superimposed input gray-scale image data F*; (dd) a process for calculating a normalized cross-correlation value C1 between the affine-transformation-superimposed input gray-scale image data F* and the target image data G as well as a normalized cross-correlation value C0 between the input gray-scale image data F and the target image data G; and (ee) a process for comparing the values C1 and C0 with each other, and if C1 greater than C0, substituting the affine-transformation-superimposed input gray-scale image data F* for the input gray-scale image data F and repeating the processes (aa) to (dd), and if not C1 greater than C0, providing at least one of the value C0 and the affine-transformation-superimposed input gray-scale image data F* corresponding to the value C0 as a matching result for the target image data G.
Still another aspect of the present invention provides a computer readable recording medium recording a program for causing the computer to execute processing for retrieving desired image data that includes target image data G from stored gray-scale image data by matching each data piece F inputted from the stored image data and the target image data G, the data F being composed of a set of gray levels representative of points that form an image. The processing includes (aaa) a process for calculating weighting coefficient based on interpoint distances between the point of the input gray-scale image data F and the point of the target image data G and inner products of gray-scale gradients at the points of the input gray-scale image data F and target image data G; (bbb) a process for determining affine parameter for the target image data G based on the calculated weighting coefficient; (ccc) a process for applying an affine transformation to the target image data G based on the determined affine parameter to shape the target image data G into affine-transformation-superimposed target gray-scale image data G*; (ddd) a process for calculating a maximal normalized cross-correlation value between the affine-transformation-superimposed target gray-scale image data G* and the input gray-scale image data F; and (eee) a process for providing at least one of the input gray-scale image data F with which the maximal normalized cross-correlation value exceeds prescribed threshold and the maximal normalized cross-correlation value itself as a retrieval result of a gray-scale image data containing the target image data G.