Advances in sensing technology and computer vision have enabled automation of many industrial tasks that require interaction with uncontrollable and dynamic environments. However, several problems persist with Objects that do not have a near-Lambertian reflectance. Specular objects with mirror-like transparent, or translucent surfaces possess material properties that are frequently considered as noise sources, and several methods attempt to suppress the impact of those effects. This means that objects that are either highly specular or have significant transparency cannot be handled by such methods because those material effects cannot be completely suppressed.
One application of interest is handling screws. Screws form a fundamental class of objects used in manufacturing systems. More threaded screws are produced each year than any other machine elements. In conventional assembly lines, the screws are placed into a part holder with a known pose (3D translational position and 3D rotational orientation) before a robot arm grasps and manipulates the screws. This operation requires either specially designed hardware, such as a part feeder, for each screw type, or manual manipulation.
The majority of screws are made from shiny metallic materials and have specular surfaces. The presense of threads further complicates the problem. Therefore, the screws cannot be handled easily by conventional computer vision methods. In addition, pose estimation of a screw in a bin is a very challenging problem because of clutter and occlusions. The problem is particularly difficult because the bin can contain hundreds of screws in innumerable possible poses.
Computer-Vision-Based Bin Picking
The primary problem in bin picking systems using computer vision is to estimate the poses of industrial parts. The problem has many challenges mainly because of specular reflections from the metallic surfaces of industrial parts, and occlusions in a cluttered bin.
Model-based pose estimation methods use correspondences between 2D image features and 3D model points. Unfortunately, the 2D-to-3D point correspondences are hard to obtain for industrial parts due to their specular surfaces. The problem is particularly severe when multiple identical objects overlap each other.
Object contours can provide information about object identities and their poses, and there exists various contour matching methods. However, for specular objects, the contour information is difficult to obtain in a cluttered bin, because the objects do not have an appearance of their own. Instead, the objects reflect the surrounding environment, e.g., other objects, the bin, and the gripper.
Range sensors have been used for pose estimation. Range data can be used to group surface features, which are used to generate and verify an estimated object pose. Several pair features can be used in a voting framework for bin picking using a range sensor. However, in the presence of specularities, range sensors fail to produce accurate depth maps. In addition, range sensors are expensive when compared with camera-based systems.
Specularities
Specularities are generally treated as a source of noise in computer vision applications. Most computer vision methods identify and remove specularities to reduce inference. Accurate feature extraction in the presence of strong specularities is a challenging task.
U.S. application Ser. No. 12/950,357 describes a method for extracting features from images of specular objects using an active illumination camera. That method uses multiple lights surrounding a lens of the camera, and acquires multiple images by flashing one light at a time. By analyzing motion of specular highlights appearing in the multiple images acquired, that method extracts the features on high-curvature regions on specular objects facing the camera. That method required multiple images to be acquired, which is time consuming.
U.S. application Ser. No. 13/549,791 describes a method for determining depth edges of objects using an active illumination camera. One or more images are acquired while flashing all the lights forming a hue circle around a lens of a camera at the same time. Depth edges are extracted by analyzing the colors of shadows cast by the lights. That method does not exploit specular reflections.