Machine vision systems, also termed “vision systems” herein, are used to perform a variety of tasks in a manufacturing environment. In general, a vision system consists of one or more camera assemblies with an image sensor (or “imager”) that acquires grayscale or color images of a scene that contains an object under manufacture. Images of the object can be analyzed to provide data/information to users and associated manufacturing processes. The data produced by the camera is typically analyzed and processed by the vision system in one or more vision system processors that can be purpose-built, or part of one or more software application(s) instantiated within a general purpose computer (e.g. a PC, laptop, tablet or smartphone).
Common vision system tasks include alignment and inspection. In an alignment task, vision system tools, such as the well-known PatMax® system commercially available from Cognex Corporation of Natick, Mass., compare features in a two-dimensional (2D) image of a scene to a trained (using an actual or synthetic model) 2D pattern, and determine the presence/absence and pose of the 2D pattern in the 2D imaged scene. This information can be used in subsequent inspection (or other operations) to search for defects and/or perform other operations, such as part rejection.
A particular task employing vision systems is the alignment of a three-dimensional (3D) target shape during runtime based upon a trained 3D model shape. 3D cameras can be based on a variety of technologies—for example, a laser displacement sensor (profiler), a stereoscopic camera, a sonar, laser or LIDAR range-finding camera, time-of-flight sensor, and a variety of other passive or active range-sensing technologies. Such cameras produce a range image wherein an array of image pixels (typically characterized as positions along orthogonal x and y axes) is produced that also contains a third (height) dimension for each pixel (typically characterized along a z axis perpendicular to the x-y plane). Alternatively, such cameras can generate a point cloud representation of an imaged object. A point cloud is a collection of 3D points in space where each point i can be represented as (Xi, Yi, Zi). A point cloud can represent a complete 3D object including the object's back and sides, top and bottom. 3D points (Xi, Yi, Zi) represent locations in space where the object is visible to the camera. In this representation, empty space is represented by the absence of points.
By way of comparison, a 3D range image representation Z(x, y) is analogous to a 2D image representation I(x, y) where the depth or height Z replaces what would be the brightness/intensity I at a location x, y in an image. A range image exclusively represents the front face of an object that is directly facing a camera, because only a single depth is associated with any point location x, y. A range image is a dense representation, but has the capability of representing that an (x,y) location is not observed by the camera. It is possible to convert a range image to a 3D point cloud in a manner clear to those of skill.
In aligning a target image, either acquired or generated by a synthetic (e.g. CAD) process, to a model image (also either acquired or synthetic) one approach involves the matching/comparison of a 3D point cloud in the target to one in the model in an effort to find the best matching pose. The comparison can include use of one or more 3D alignment algorithm(s) that involve a metric for scoring of the coverage of the target with respect to the model. For any 3D alignment algorithm, there are instances where the trained model consists of a larger (or more complete) view of the part than what is viewed at runtime—e.g. a portion of the runtime object image can be outside the field of view or otherwise cut off due to—for example, self-occlusion. Likewise, there are instances in which the runtime scene contains more information than the model. For example, if both the model and the runtime scene were acquired (from a respective object) using a single camera, but the object is presented in different orientations in each version, then the runtime scene may contain regions of the part that are absent from the model, and vice versa. If the alignment scoring metric does not account for these scenarios, then it may calculate a lower score than the match actually warrants and the candidate pose may be rejected. More generally, this concern exists in any application where a stage of the solution is to align to an object with significant self-occlusion, or where multiple images are fused at training time to obtain a full 360 degree view of the object, but only one acquired image of the object is used at runtime to perform alignment. In addition, some of either the runtime or training time object image(s) can be outside of the union of the working section (field of view) of the image sensor. These conditions can affect the application of the 3D alignment algorithm and its results in a variety of ways.
A further concern with 3D vision system processes is that, when a 3D image sensor acquires a measurement of an object, it typically measures only the portion of the object that is facing the sensor, and thus, the sensor only acquires the portion of the object that is within the sensor's measurement region. Sometimes the image 3D data includes spurious, or incorrect, measurements (possibly due to inter-reflections). This incorrect or spurious data can affect the efficiency and accuracy of the measurement process.