1. Field of Invention
The present invention is generally directed to the field of robotic manipulation of objects. More specifically, it is directed towards machine recognition and manipulation of non-rigid objects, such as cable harnesses, by visual inspection of the non-rigid objects.
2. Description of Related Art
In the field of automated, or robotic, manufacturing or assembly, the ability to identify assembly components, manipulate and attach them to other components is very important. Often, this is achieved by use of assembly stations, where each assembly station is limited to one component having one known orientation and requiring simplified manipulation.
It would be advantageous, however, for a machine to be able to visually select a needed component from a supply of multiple components, identify any key assembly features of the component, and manipulate the selected component as needed for assembly. This would require that the machine have some capacity for computer vision, object recognition and manipulation.
Before discussing some details of computer vision, it may be beneficial to first discuss how computer vision has previously been used in the field of robotic (or machine) vision. Two important aspects of robotic vision are the identifying of an object and the estimating of its pose, i.e. its 3-dimensional (i.e. 3D) orientation relative to a known reference point and/or plane.
Since most cameras take 2-dimensional (i.e. 2D) images, many approaches attempt to identify objects in a 2D image and infer some 3D information from the 2D image. For example, in “Class-specific grasping of 3D objects from a single 2D image”, by Chiu et al. The 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Oct. 18-22, 2010, Chiu et al. describe superimposing 2D panels in the form of simplified 2D shapes on the surface of objects in a 2D image. The 2D panels on each imaged object form a set that defines the object in the 2D image. The generated 2D panels can then be compared with a library of panel sets that define different types of predefined 3D objects, such as a car. Each library panel set is compared from different view directions with the generated 2D panels of the imaged object in an effort to find a relatively close match. If a match is found, then in addition to having identified the object, one has the added benefit of having a good guess as to its orientation given the matched orientation of the 2D panel set of the predefined 3D object in the library.
A second example is found in “Human Tracking using 3D Surface Colour Distributions” by Roberts et al. Image and Vision Computing, 2006, by Roberts et al. In this example, Roberts et al describe a system where simplified 2D shapes are superimposed on known rigid parts of a human body (such as the head, torso, arms, etc) in a 2D video image. The movements of the superimposed, simplified 2D shapes follow the movements of the moving human in the 2D video. By analyzing the movements of the 2D shapes, it is possible to discern the movement of the imaged human.
As is stated above, however, identifying a desired object in an image is only part of the solution, particularly when dealing with moving objects. In such cases, one further needs to discern information about the viewed object's pose, or orientation, in three-dimensional (i.e. 3D) space and its possible movement through 3D space. Various approaches have been used to address this need.
For example, in “3D Pose Estimation for Planes”, by Xu et al. Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on Sep. 27, 2009-Oct. 4, 2009. Xu et al. describe using a plane outline on the surface of a target object in a non-stereo image, and estimating the plane's normal direction to estimate the object's pose orientation.
A second example is found in “Robust 3D Pose Estimation and Efficient 2D Region-Based Segmentation from a 3D Shape Prior”, by Dambreville et al. European Conference on Computer Vision ICCV, 2008. Dambreville et al. describe segmenting a rigid, known, target object in a 2D image, and estimating its 3D pose by fitting onto the segmented target object, the best 2D projection of known 3D poses of the known target object.
A third example is provided in “Spatio-temporal 3D Pose Estimation of Objects in Stereo Images” by Barrois et al. Proceedings of the 6th international conference on Computer vision systems, ICVS'08. Barrois et al. describe using a 3D object's normal velocity (defined by the object's main direction of movement) at one point in time to estimate its pose at another point in time along a movement path.
Returning to the subject of computer vision, it is generally desirable that an image not only be captured, but that a computer is able to identify and label various features within the captured image. Basically, a goal of computer vision is for the computer to duplicate the abilities of human vision by electronically perceiving and understanding the contents of a captured image. This involves extracting symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory. Thus, the field of computer vision includes methods for acquiring, processing, analyzing, and gleaning an understanding of imaged objects, in order to form decisions.
Various approaches for identifying features within a captured image are known in the industry. Many early approaches centered on the concept of identifying shapes of rigid bodies. For example, if a goal was to identify a specific rigid item, such as a wrench or a type of wrench, then a library of the different types of acceptable wrenches (i.e. examples of “true” wrenches) would be created. The outline shapes of the true wrenches would be stored, and a search for the acceptable outline shapes would be conducted on a captured image.
Outline shapes within a captured image might be identified by means of a segmentation process, which is a process by which the outlines (or masks) of foreground objects within a digital image are defined by differentiating the image's foreground pixels from the image's background pixels. This would define an outline of the foreground object, such as a wrench, and the defined outline could then be compared with a library of known wrench outlines in various pose positions. This approach of searching the outline of a shape was successful when one had an exhaustive library of acceptable outline shapes, the library of known outline shapes was not overly large, the outline shape of the target object within the digital image did not deviate much from the predefined true outline shapes, and the background surrounding the target object was not overly complicated.
For complex searches, however, this approach is not effective. The limitations of this approach become readily apparent when the subject (i.e. object) being sought within an image is not static (i.e. non-rigid), but is prone to change and/or deformation. For example, a human face has definite characteristics, and its distortion is limited, but it still does not have an easily definable number of shapes and/or appearance it may adopt. It is to be understood that the term appearance is herein used to refer to color and/or light differences across an object, as well as other surface/texture variances. Other types of target objects may be prone to far more deformation than a human face. For example, cable harnesses have definite characteristics, but may take many different shapes and arrangements due to their wiring lacking many, if any, rigid structures.
Although an exhaustive library of samples of a known rigid body may be compiled for identification purposes, it is self-evident that compiling an exhaustive library of non-rigid or amorphous objects and their many variations due to pose angle, color, and lighting differences is a practical impossibility. Thus, statistical methods have been developed to address these difficulties.
Developments in image recognition of objects that change their shape and appearance, are discussed in “Statistical Models of Appearance for Computer Vision”, by T. F. Cootes and C. J. Taylor (hereinafter Cootes et al.), Imaging Science and Biomedical Engineering, University of Manchester, Manchester M13 9PT, U.K. available at Hypertext Transfer Protocol address “www.isbe.man.ac.uk,” Mar. 8, 2004, which is hereby incorporated in its entirety by reference.
To better mimic human vision, it is advantageous for machines to incorporate stereo vision, and thereby obtain depth information from captured images. Images of a common scene taken from different view angles are the basis for stereo vision and depth perception. In this case, corresponding feature points in two images taken from different view angles (and/or different fields of vision) of the same subject (or scene) can be combined to create a perspective view of the scene. Thus, imaging a scene from two different view points (i.e. from two different field-of-views, FOVs) creates stereo vision, which provides depth information about objects in the scene.
This ability would be particularly helpful in the field of robotics and automated assembly/construction. In these applications, a machine having stereo vision and the ability to discern (i.e. identify) target items would ideally have the ability to independently retrieve the target item and use it in an assembly.
Implementing such vision capabilities, however, is still a challenge, even in a specialized assembly line where the number of possible target object variants is limited. The challenges become even more daunting when the target objects are amorphous, or non-rigid, and prone to change in shape and/or appearance, such as in the case of wire harnesses.
It is an object of the present invention to provide a system for identifying and manipulating cable harnesses for use in robotic assembly lines.
It is a further object of the present invention to make use of 3D information for determining pose information of cable harnesses.
It is object of the present invention to provide an automated 3D visual system to facilitate machine manipulation of cable harnesses.