Visually locating a tool tip on a robot is essential for vision-based robot control, which uses feedback from vision sensors to control the motion of a robot. The visual feedback reduces the reliance on precise calibration of the camera lens system as well as the entire robot system. To improve the positioning accuracy and stability, real world robot applications demand highly reliable algorithms for localizing robot end-effectors in unconstructed and dynamic environments.
Most researchers use template matching (using an image template) to locate the end-effector. The image template describes color, texture, and gradient-based edges. A region providing the maximal similarity measure is selected as the location of the object in the image. This kind of modeling includes assumptions about ambient lighting and background color that are not object features and, therefore, demonstrates a lack of robustness that is primarily due to lighting and background variation (see, for example, the List of Cited Literature References, Literature Reference No. 4). Other researchers apply feature-based methods, such as Harris corner features (see Literature Reference No. 5), KLT features (see Literature Reference No. 6) and SIFT features (see Literature Reference No. 7). Significant work has also been reported in object detection and recognition (see Literature Reference No. 8). Such methods usually require that the object has a rich surface texture, which is typically not available for most end-effectors, like a drill bit. Another problem in feature-based segmentation is the separation of features belonging to the objects from features belonging to the background. Binocular disparity and consistency optical flow may be included to allow separation of the object from the background. However, the disparity is not suitable when the difference in depth between an object and its background are small. Further, existing optical flow-based object segmentation methods often result in noisy and inconsistent flow patterns, especially if motion of the object is big. Moreover, they require oscillation-like movement of the tool (see Literature Reference No. 9), which is undesirable for a tool like a drill.
As another prior art example, active contours or so-called snakes are able to segment rigid and semi-rigid objects and are better able to preserve the shape of object (see Literature Reference No. 10). Such snakes allow tracking of arbitrary shapes and are relatively robust to occlusions. However, Snakes are sensitive to parameters and the initialization of the algorithm. Moreover, Snakes have a limited capture range and fail to detect concavities.
The performance of visual servoing highly depends on the robustness of the end-effector location. In a cluttered environment, the visual appearance of an end-effector depends upon a variety of parameters including geometry, surface characteristics, illumination, the geometric relation between camera and object(s), etc. The large number of parameters often results in a noisy and inconsistent appearance of the end-effector, obscuring its extraction. Therefore, almost all of the existing techniques are either model-based and require an off-line calibration or require special markers/fiducial points on an end-effector. Moreover, the lighting and environment are usually contrived (for instance using white backgrounds and dark objects) to yield a high contrast and thereby naturally distinguish foreground objects from the background. Vision under natural conditions remains very challenging.
Thus, a continuing need exists for a device and system that can robustly locate a drill bit or other tool tip in a variety of conditions.