The use of advanced machine vision systems and their underlying software is increasingly employed in a variety of manufacturing and quality control processes. Machine vision enables quicker, more accurate and repeatable results to be obtained in the production of both mass-produced and custom products. Typical machine vision systems include one or more cameras (typically having solid-state charge couple device (CCD) or CMOS-based imaging elements) directed at an area of interest, a frame grabber/image processing elements that capture and transmit CCD images, a computer or onboard processing device, and a user interface for running the machine vision software application and manipulating the captured images, and appropriate illumination on the area of interest.
Many applications of machine vision involve the determination of the relative position of a part in multiple degrees of freedom with respect to the field of view. Machine vision is also employed in varying degrees to assist in manipulating manufacturing engines in the performance of specific tasks, particularly those where distance information on an object is desirable. A particular task using 3D machine vision is visual servoing of robots in which a robot end effector is guided to a target using a machine vision feedback and based upon conventional control systems and processes (not shown). Other applications also employ machine vision to locate stationary and/or moving patterns.
The advent of increasingly faster and higher-performance computers has enabled the development of machine vision tools that can perform complex calculations in analyzing the pose of a viewed part in multiple dimensions. Such tools enable a previously trained/stored image pattern to be acquired and registered/identified regardless of its viewed position. In particular, existing commercially available search tools can register such patterns transformed by at least three degrees of freedom, including at least three translational degrees of freedom (x and y-axis image plane and the z-axis) and two or more non-translational degrees of freedom (rotation, for example) relative to a predetermined origin.
One form of 3D vision system is based upon stereo cameras employing at least two cameras arranged in a side-by-side relationship with a baseline of one-to-several inches therebetween. Stereo-vision based systems in general are based on epipolar geometry and image rectification. They use correlation based methods or combining with relaxation techniques to find the correspondence in rectified images from two or more cameras. The limitations of stereo vision systems are in part a result of small baselines among cameras, which requires more textured features in the scene, and reasonable estimation of the distance range of the object from the cameras. Thus, the accuracy achieved may be limited to pixel level (as opposed to a finer sub-pixel level accuracy), and more computation and processing overhead is required to determine dense 3D profiles on objects.
Using pattern searching in multiple camera systems (for example as a rotation and scale-invariant search application, such as the PatMax® system, available from Cognex Corporation of Natick, Mass.) can locate features in an acquired image of an object after these features have been trained, either using training features acquired from the actual object or synthetically provided features, and obtaining the feature correspondences is desirable for high accuracy and high speed requirements since the geometric pattern based searching vision system can get much higher accuracy and faster speed. However, there are significant challenges to obtaining accurate results with training models. When the same trained model is used for all cameras in a 3D vision system, performance decreases as viewing angle increases between the cameras, since the appearance of the same object may differ significantly as the object provides a differing appearance in each camera's field of view. More particularly, the vision system application's searching speed and accuracy is affected due to the feature contrast level changes and shape changes (due to homographic projections) between cameras.
More generally, an object in 3D can be registered from a trained pattern using at least two discrete images of the object generated from cameras observing the object from different locations. In any such arrangement there are challenges to registering an object in three-dimensions from trained images using this approach. For example, when non-coplanar object features are imaged using a perspective camera with a conventional perspective (also termed “projective” in the art) lens (one in which the received light rays cross), different features of the acquired image undergo different transformations, and thus, a single affine transformation can no longer be relied upon to provide the registered pattern. Also, any self-occlusions in the acquired image will tend to appear as boundaries in the simultaneously (contemporaneously) acquired images. This effectively fools the 2D vision system into assuming an acquired image has a different shape than the trained counterpart, and more generally complicates the registration process.
The challenges in registering two perspective images of an object are further explained by way of example with reference to FIG. 1. The camera 110 is arranged to image the same object 120 moves to two different positions 130 and 132 (shown respectively in dashed lines and solid lines) relative to the camera's field of view, which is centered around the optical axis 140. Because the camera 110 and associated lens 112 image a perspective view of the object 120, the resulting 2D image 150 and 2D image 152 of the object 120 at each respective position 130 and 132 are different in both size and shape. Note that the depicted change in size and shape due to perspective is further pronounced if the object is tilted, and becomes even more pronounced the more the object is tilted.
One known implementation for providing 3D poses of objects using a plurality of cameras are used to generate a 3D image of an object within a scene employs triangulation techniques to establish all three dimensions. Commonly assigned, published U.S. Patent Application No. 2007/0081714 A1, entitled METHODS AND APPARATUS FOR PRACTICAL 3D VISION SYSTEM, by Aaron S. Wallack, et al., the teachings of which are incorporated herein as useful background information, describes a technique for registering 3D objects via triangulation of 2D features (derived, for example, using a robust 2D vision system application, such as PatMax®) when using perspective cameras. This technique relies upon location of trained features in the object from each camera's image. The technique triangulates the position of the located feature in each image, based upon the known spatial position and orientation of the camera within the world coordinate system (x, y, z) to derive the pose of the object within the coordinate system. While this approach is effective, it and other approaches are not optimal for systems using, for example, a differing type of lens arrangement, and would still benefit from increased accuracy of correspondences and decreased processor overhead.
It is, therefore, desirable to provide a 3D vision system arrangement that allows for more efficient determination of 3D pose of an object. This can, in turn, benefit the throughput and/or efficiency of various underlying operations that employ 3D pose data, such as robot manipulation of objects.