Techniques of visual object (and/or pattern) recognition are increasingly important in automated manufacturing, biomedical engineering, cartography and many other fields. Model-based recognition techniques typically must solve the problem of finding, in an image acquired by a camera, an occurrence of a previously defined model that has been affected by affine transformation. Affine transformations may be defined as transformations in which straight lines remain straight and parallelism is preserved. Angles however, may undergo changes and differential scale changes may be introduced.
Images, which are the projection of a three-dimensional world onto a plane are dependent on the position, orientation and the intrinsic properties of the camera which is acquiring the image. Image distortions might be introduced by different scale factors in the X and Y directions. Perspective distortions might be introduced due to the optical axis of the camera's lens being at an oblique angle to the object plane. Distortion might also be introduced by optical imperfections of the camera's lens. Finally, distortions might appear because the object is not seated on a planar surface.
Known object recognition algorithms process acquired images to find an occurrence of a match between an image and a model that is subject to affine transformation. When images are distorted (e.g. due to perspective, lens distortion, etc) finding a match with the model requires, from the matching algorithm, more than affine transformation capability.
Geometric hashing, as described in “Affine Invariant Model-Based Object Recognition” (Y Lamdan, J. T. Schwartz, H. J. Wolfson, IEEE Transactions on Robotics and Automation, Vol. 6, No. 5. October 1990), generalized Hough transform, as described in “Computer Vision” (D. H. Ballard, C. M. Brown, pp. 128-131, Prentice Hall 1982B), and other geometric based pattern matching methods that work in the presence of affine transformations are sensitive to image distortions because of their global nature. In fact, these methods are based on a global description of the model, which is altered by perspective and non-linear distortions. Consequently, distortion introduces errors that may result in failure of these methods. Even when occurrences of a model are correctly identified, the position, angle and scale of the occurrences are frequently inaccurate.
When used with a known object or world surface, camera calibration can be considered as the definition of a one-to-one mapping (or a transformation function) between the world surface and its distorted projection in “image space”. As such, the transformation function maps any coordinates in the image coordinate system of the image space to corresponding world coordinates in the known world surface and vice-versa. Well-known methods of camera calibration are described by Tsai (R. Tsai, “A Versatile Camera Calibration Technique for High Accuracy 3D Machine Vision Metrology Using Off the Shelf TV Cameras and Lenses”, IBM Research Report, RC 11413, 1985) and by Faugeras (O. Faugeras, “Three Dimensional Computer Vision, A Geometric Point Of View”, chap 3: “Modeling and calibrating cameras”, pp. 33-68 MIT Press 1993).
When image distortion is negligible, camera calibration can be used to convert results from an operation performed in the image to the real world coordinate system of the user. For example, an acquired image can be processed (in image space coordinates) to estimate the location of the object (in world space). This information can then be used to control a robot arm (operating in world space coordinates) to pick up the object However, for such operations image distortions can prevent the operation from being performed correctly (or accurately).
One method to deal with image distortions is to calibrate and warp an acquired image to obtain a comparatively non-distorted image, prior to applying a pattern matching algorithm to find model occurrences. All processing of image features is done in the calibrated “non-distorted image space”. Results are computed in the “non-distorted image space”, and then transformed to world space coordinates for display to a user (and/or controlling other operations). However, processing an acquired image to obtain a non-distorted image requires intensive image processing, which slows down the speed at which an object can be recognized. In addition, pixel values of the “non-distorted image” must be interpolated from pixel values of the acquired image This interpolation also introduces its own imprecision, thereby degrading precision of the subsequent matching operations.
Accordingly, a method and apparatus enabling efficient recognition of an object remains highly desirable.