1. Field of the Invention
This disclosure generally relates to systems and methods of three-dimensional pose estimation employing machine vision, for example useful in robotic systems.
2. Description of the Related Art
The ability to determine a three-dimensional pose (i.e., three-dimensional position and orientation) of an object can be useful in a number of settings. For example, three-dimensional pose estimation may be useful in various robotic systems that employ machine-vision.
One type of machine-vision problem is known as bin picking. Bin picking typically takes the form of identifying an object collocated in a group of identical or similar objects, for example objects such as parts collocated in a bin or other container. Identification may include three-dimensional pose estimation of the object to allow engagement of the object by a robot member and removal of the object from the group of objects.
There are many object recognition methods available for locating complex industrial parts having a large number of machine-vision detectable features. A complex part with a large number of features provides redundancy, and typically can be reliably recognized even when some fraction of the features are not properly detected. However, many parts are simple parts and do not have a sufficient level of redundancy in machine-vision detectable features and/or which have rough edges or other geometric features which are not clear. In addition, the features typically used for recognition, such as edges detected in captured images, are notoriously difficult to extract consistently from image to image when a large number of parts are jumbled together in a bin. The parts therefore cannot be readily located, especially given the potentially harsh nature of the environment, e.g., uncertain lighting conditions, varying amounts of occlusions, etc.
The problem of recognizing a simple part among many parts lying jumbled in a bin, such that a robotic system is able to grasp and manipulate the part in an industrial or other process, is quite different from the problem of recognizing a complex part having many detectable features. Machine-vision based systems recognizing and locating three-dimensional objects, using either (a) two-dimensional data from a single image or (b) three-dimensional data from stereo images or range scanners, are known. Single image methods can be subdivided into model-based and appearance-based approaches.
The model-based approaches suffer from difficulties in feature extraction under harsh lighting conditions, including significant shadowing and specularities. Furthermore, simple parts do not contain a large number of machine-vision detectable features, which degrades the accuracy of a model-based fit to noisy image data.
The appearance-based approaches have no knowledge of the underlying three-dimensional structure of the object, merely knowledge of two-dimensional images of the object. These approaches have problems in segmenting out the object for recognition, have trouble with occlusions, and may not provide a three-dimensional pose estimation that is accurate enough for grasping purposes.
Approaches that use three-dimensional data for recognition have somewhat different issues. Lighting effects cause problems for stereo reconstruction, and specularities can create spurious data both for stereo and laser range finders. Once the three-dimensional data is generated, there are the issues of segmentation and representation. On the representation side, more complex models are often used than in the two-dimensional case (e.g., superquadrics). These models contain a larger number of free parameters, which can be difficult to fit to noisy data.
Assuming that a part can be located, it must be picked up by the robotic system. The current standard for motion trajectories leading up to the grasping of an identified part is known as image based visual servoing (IBVS). A key problem for IBVS is that image based servo systems control image error, but do not explicitly consider the physical camera trajectory. Image error results when image trajectories cross near the center of the visual field (i.e., requiring a large scale rotation of the camera). The conditioning of the image Jacobian results in a phenomenon known as camera retreat. Namely, the robotic system is also required to move the camera back and forth along the optical axis direction over a large distance, possibly exceeding the robotic system range of motion. Hybrid approaches decompose the robotic system motion into translational and rotational components either through identifying homeographic relationships between sets of images, which is computationally expensive, or through a simplified approach which separates out the optical axis motion. The more simplified hybrid approaches introduce a second key problem for visual servoing, which is the need to keep features within the image plane as the robotic system moves.
Conventional bin picking systems are relatively deficient in at least one of the following: robustness, accuracy, and speed. Robustness is required since there may be no cost savings to the manufacturer if the error rate of correctly picking an object from a bin is not close to zero (as the picking station will still need to be manned). Location accuracy is necessary so that the grasping operation will not fail. And finally, solutions which take too long between picks would slow down entire production lines, and would not be cost effective.