1. Field of the Invention
The present invention relates to a method and apparatus for detecting or locating an object in an image. In particular, the present invention relates to a method and apparatus for matching a template to an image, to locate an object corresponding to the template, when the object has been subject to a geometric transformation. The present invention further relates to a method for determining a geometric transformation of an object in an image.
2. Description of the Background Art
Template matching (TM) is a standard computer vision tool for finding objects or object parts in images. It is used in many applications including remote sensing, medical imaging, and automatic inspection in industry. The detection of real-world objects is a challenging problem due to the presence of illumination and colour changes, partial occlusions, noise and clutter in the background, and dynamic changes in the object itself.
A variety of template matching algorithms have been proposed. For example, P. Viola, M. Jones, “Rapid object detection using a boosted cascade of simple features” IEEE CVPR, pp. 511-518, 2001 (reference 1, infra) and EP-A-1 693 783 (reference 2, infra) describe extremely fast computing based on simple rectangular features. Other examples, such as Jain, Y. Zhong, S. Lakshmanan, “Object Matching Using Deformable Templates”, IEEE TPAMI, Vol. 18(3), pp 267-278, 1996 (reference 3, infra) and S. Yoshimura, T. Kanade, “Fast template matching based on the normalized correlation by using multiresolution eigenimages”, IEEE/RSJ/GI Int. Conf. on Intelligent Robots and Systems (IROS'94), Vol. 3, pp. 2086-2093, 1994 (reference 4, infra) describe fitting rigidly or non-rigidly deformed templates to image data.
The general strategy of template matching is the following: for every possible location, rotation, scale, or other geometric transformation, compare each image region to a template and select the best matching scores. This computationally expensive approach requires O(NlNgNt) operations, where Nl is the number of locations in the image, Ng is the number of transformation samples, and Nt is the number of pixels used in matching score computation. Many methods try to reduce the computational complexity. Nl and Ng are usually reduced by the multiresolution approach (e.g., such as in reference 4, infra). Often the geometric transformations are not included in the matching strategy at all, assuming that the template and the image patch differ by translation only (such as in reference 11, infra).
Another way to perform template matching is direct fitting of the template using gradient descent or gradient ascent optimization methods to iteratively adjust the geometric transformation until the best match is found. Such a technique is described in Lucas, T. Kanade, “An iterative image registration technique with an application to stereo vision” Proc. of Imaging understanding workshop, pp 121-130, 1981 (reference 10, infra). These techniques need initial approximations that are close to the right solution.
In rapid template matching methods (such as those described in references 1, 2, 5, 6, 7, infra) the term Nt in the computational complexity defined above is reduced by template simplification, e.g., by representing the template as a combination of rectangles. Using special image preprocessing techniques, so-called integral images, and computing a simplified similarity score, the normalized contrast between “positive” and “negative” image regions defined by the template, the computational speed of rapid template matching is independent of the template size and depends only on the template complexity (the number of rectangles comprising the template). However, Haar-like features are not rotation-invariant, and a few extensions of this framework have been proposed to handle the image rotation. For example M. Jones, P. Viola, “Fast Multi-view Face Detection”, IEEE CVPR, June 2003 (reference 5, iqfra), proposed additional set diagonal rectangular templates. R. Lienhart, J. Maydt. “An extended set of Haar-like features for rapid object detection”, ICIP'02, pp. 900-903, V.1, 2002 (reference 6, infra), proposed 45° twisted Haar-like features computed via 45° rotated integral images. Messom, C. H. and Barczak, A. L, “Fast and Efficient Rotated Haar-like Features using Rotated Integral Images”, Australasian Conf. on Robotics and Automation, 2006 (reference 7, infra) further extended this idea and used multiple sets of Haar-like features and integral images rotated by whole integer-pixel based rotations.
The rapid template matching framework, described above, has a few implicit drawbacks, which are not presented in computationally expensive correlation-based TM methods
A first drawback is that it is not easy to generalize two-region Haar-like features to the case of three or more pixel groups. In addition, rectangle-based representation is redundant for curvilinear object shapes, e.g. circles. Usage of curved templates instead of the rectangular ones should result in such cases in higher matching scores and, therefore, in better detector performance.
Moreover, whilst impressive results with Haar-like features may be achieved by using powerful classifiers based on boosting (as in reference 1, infra), such techniques require training on large databases. Therefore, matching using a single object template (achievable at no additional cost in correlation-based template matching using a grayscale template) cannot be easily performed in this framework, or it can be performed only for objects having simple shape and bimodal intensity distribution.
The present application proposes a new approach that can be placed in between rapid template matching methods and standard correlation-based template matching methods in terms of computational complexity and this matching speed. The proposed approach addresses some of the limitations of existing techniques described above and, optionally, can also be extended to an iterative refinement framework for precise estimation of object location and transformation.