Machine vision systems, also termed “vision systems” herein, are used to perform a variety of tasks in a manufacturing environment. In general, a vision system consists of one or more two-dimensional (2D) cameras with an image sensor (or “imager”) that acquires grayscale or color images of a scene that contains an object under manufacture. 2D images of the object can be analyzed to provide data/information to users and associated manufacturing processes. The data produced by the 2D camera is typically analyzed and processed by the vision system in one or more vision system processors that can be purpose-built, or part of one or more software application(s) instantiated within a general purpose computer (e.g. a PC, laptop, tablet or smartphone).
Common vision system tasks include alignment and inspection. In an alignment task, vision system tools, such as the well-known PatMax® system commercially available from Cognex Corporation of Natick, Mass., compare features in a 2D image of a scene to a trained (using an actual or synthetic model) 2D pattern, and determine the presence/absence and pose of the 2D pattern in the 2D imaged scene. This information can be used in subsequent inspection (or other) operations to search for defects and/or perform other operations, such as part rejection.
A particular task employing vision systems is the alignment of a three-dimensional (3D) target shape during runtime based upon a trained 3D model shape. 3D cameras can be based on a variety of technologies—for example, a laser displacement sensor (profiler), a stereoscopic camera, a sonar, laser or LIDAR range-finding camera, and a variety of other passive or active range-sensing technologies. Such cameras produce a range image wherein an array of image pixels (typically characterized as positions along orthogonal x and y axes) is produced that also contain a third (height) dimension for each pixel (typically characterized along a z axis perpendicular to the x-y plane). Alternatively, such cameras can generate a point cloud representation of an imaged object. A point cloud is a collection of 3D points in space where each point i can be represented as (Xi, Yi, Zi). A point cloud can represent a complete 3D object including the object's back and sides, top and bottom. 3D points (Xi, Yi, Zi) represent locations in space where the object is visible to the camera. In this representation, empty space is represented by the absence of points.
By way of comparison, a 3D range image representation Z(x, y) is analogous to a 2D image representation I(x, y) where the depth or height Z replaces what would be the brightness/intensity I at a location x, y in an image. A range image exclusively represents the front face of an object that is directly facing a camera, because only a single depth is associated with any point location x, y. The range image typically cannot represent an object's back or sides, top or bottom. A range image typically has data at every location (x, y) even if the camera is free of information at such locations. Sometimes, the camera image data directly represents that no information is present by including “missing pixel” labels at certain locations. A “missing pixel” could mean that the imaging conditions were poor at the location in the image, or it could mean that a hole is present at that location in the object. 3D range images can sometimes be processed with conventional 2D image processing techniques where the height dimension Z is substituted for brightness/intensity I, and missing pixels are handled in a special way or they are ignored.
By way of further background, converting 3D images between range image representations and 3D point cloud representations of 3D can be accomplished by appropriate techniques, but not without loss of information and/or loss of accuracy. Some 3D cameras can directly produce either 3D point cloud images OR 3D range images at the time the images are acquired (concurrently). Even converting images between representations from cameras that are able to produce either representation can cause loss of accuracy once the images are acquired. Thus, a 3D point cloud or range image is most accurate when is acquired by the camera in that respective mode.
In aligning a target image (either acquired or generated by a synthetic process) to a model image (also either acquired or synthetic) one approach involves the matching/comparison of a 3D point cloud in the target to one in the model in an effort to find the best matching pose. The comparison can involve a scoring of the coverage of the target with respect to the model. A score above a certain threshold is considered an acceptable match/pose-estimation, and this information is used to generate an alignment result. It is nevertheless challenging to accurately and efficiently generate an alignment result based upon 3D images, and practical, generalized techniques that employ matching of model and target 3D point clouds are generally unavailable.