Industrial robots are typically designed to perform the same task repeatedly with high accuracy and precision. In several industrial applications such as manufacturing and assembly, robots are used to ‘pick’ parts from a bin (part acquiring) and place the parts in a correct position and orientation (pose) for subsequent processing.
Robots rely on a consistent pose of the parts to be picked. Any deviation results in damage of the robot or part, which increases costs. Typically, custom designed mechanical and electromechanical systems are used to feed parts in specific pose to the robot. In some cases, the parts are pre-positioned manually so that the robot can easily pick up the parts.
More recently, computer vision techniques have been used to automate the process of part location and picking. Most conventional automated techniques can only pick a single non-occluding part, or parts lying apart from others, e.g., parts loosely scattered on a conveyor belt.
Some vision-assisted systems can pick stacked parts, but only using sophisticated mechanical systems or manual intervention. Most of vision-assisted systems lack reliability, accuracy and robustness and use expensive vision sensors and hardware. Conventional vision-assisted systems lack the capability of 3D part acquisition, when parts are placed randomly, in a haphazard manner on top of each other in a pile or a bin.
The problem of 3D pose estimation and part acquisition is well known. Manual part acquisition involves humans to acquire and place for assembly. This is a risk for humans working with heavy parts. In addition, a certain level of skill set is required from the human operators. It is desired to reduce costs by replacing human operators.
Automated part acquisition systems typically use electromechanical devices such as a robot arm equipped with a specially designed grasper for the parts to be picked. However, the robot needs to know the pose of the part to be picked. Methods such as precision fixturing could be used to present the part in a specific pose to the robot arm. These systems are costly, lack interoperability, i.e., the systems need to be designed specifically for a given part, and cannot handle a bin of randomly stacked parts.
Computer vision systems can be used to determine the pose of objects. Those systems typically use one or more cameras. Images acquired by the cameras can be analyzed to locate the objects and to provide feedback to the robot arm for subsequent operations. Most vision systems are 2D and can only be used for 2D tasks such as inspection, and simple part acquisition. Those systems cart only determine an in-plane orientation and location of the part, but cannot determine any out-of-plane rotation and the distance to the part. Typically, those 2D systems require parts to be non-overlapping and placed on a flat surface. Thus, those systems cannot operate on pile of randomly placed objects.
Some systems augment the 2D vision system by basing the distance to the object on a size of the object in an image. However, those 2.5D system cannot estimate the out of plane rotation, and are often unreliable in their distance estimates.
3D vision systems typically use sensors for estimating the 3D geometry of the scene. A stereo system uses two camera to estimate distance to an object. First, corresponding features are located in the stereo images. The geometric relationship between the cameras can be use to identify the depth (distance) of the features. However, locating corresponding features itself is a challenging problem, especially for machines parts, which are often highly reflective and homogeneous in appearance. Stereo systems can erroneously estimate depth if the images are noisy with respect to the features. Another problem with stereo systems is that the depths are recovered only for the features and not over the entire object. The reduced accuracy is insufficient for accurate bin-picking.
Laser triangulation uses structured light to generate a pattern on the surface of an object whole images are acquired by a camera. Laser triangulation can recover the 3D shape of the object surface. That technology has been used for applications involving edge tracking for welding, sealing, glue deposition, grinding, waterjet cutting and debarring of flexible and dimensionally unstable parts, for example.
Laser triangulation requires image registration and accounting for shadows and occlusions. Those systems have not yet been perfected for general, random bin-picking applications. In addition, lasers often leads to safety issues when deployed in close proximity of human operators.