The present invention relates to the art of machine vision, and more specifically, to a system for training a three dimensional object identification system and identifying three dimensional objects using three dimensional semantic segments.
Machine Vision (MV) describes technology and methods used to provide imaging-based automatic inspection and analysis for a variety of applications. Machine vision employs captured images to identify, inspect, and/or manipulate objects undergoing one or more processes. Imaging devices may be combined with a processing unit or maintained separately. When separated, a connection may be made to specialized intermediate hardware, such as a frame grabber. Many MV applications utilize digital cameras capable of direct connection to an image processor without the need for intermediate hardware.
Many MV systems rely upon two dimensional (2D) imaging processes that employ various light sources. For example, 2D imaging system may employ visible light, infra-red light, line scan imaging, and X-ray imaging. Typically, 2D imaging systems fall into one of two categories monochromatic images, and color images. 2D imaging systems may also detect portions of an object, or to process an entire image. Systems that process an entire image are often times employed in moving or production line processes. Three-dimensional (3D) imaging may also be employed in MV systems. Generally, 3D imaging includes scanning based triangulation, time of flight, grid based, and stereoscopic processes.
Part-based object recognition techniques employ 3D models with semantic points to determine a part center. Such an approach may be time consuming, requiring that the sematic points are manually annotated into the 3D models. Further, this approach does not identify specific regions of a part. Part based object recognition techniques also employ color recognition aide in identification. Colors are sampled at multiple points on an objects surface through pose-annotate training images, a two dimensional projection of the object is generated for each candidate pose, and a pixel-wise template is created. The pixel-wise template is used for matching color and/or texture in a query image. While this approach may be better at identifying specific regions of an object, several drawbacks still exist. Distortions in the pixel-wise template may exist. That is, color at a particular point may vary in each candidate pose due to illumination shortcomings, reflections, and occlusions on the part itself.