1. Field of the Invention
The present invention relates to an image generation apparatus, an image generation method, and a storage medium and, more particularly, to an image generation apparatus that generates an orientation-specific image of a target object, an image generation method, and a storage medium.
2. Description of the Related Art
In a production site, a robot arm having a hand mechanism for gripping a target object is used to, for example, assemble a product or convey a part. An image capturing apparatus such as a camera has recently been introduced as the “eye” of a robot system that controls such a robot arm. An image captured by the image capturing apparatus is used to, for example, do pattern recognition processing and specify the position of a target part to be gripped by the hand or conduct visual inspection to check whether the target part has a defect.
As for parts to be handled by such a robot system, conventionally, a plurality of parts are often regularly arranged on an array pallet and handled. If the parts line up, they are guaranteed to have a predetermined orientation. For this reason, specifying only the position of each part by relatively simple pattern recognition processing allows the hand to grip the parts. However, to arrange the parts on the array pallet in advance, a manual operation or a dedicated machine for line-up is necessary, leading to higher cost.
Hence, there is a growing demand for causing a hand mechanism attached to an arm with a high degree of freedom to directly grip a target part out of a number of parts in a “pile-up state”, that is, parts that simply pile up on a tray in various orientations. To do this control, it is necessary to sense the tray in the “pile-up state” and estimate not only the position but also the orientation (direction) of the target part as accurately as possible.
A household robot that has recently been developed for the entertainment, household assist, care applications or the like needs to identify various objects in a daily space and cause the hand to grip a target object as needed. For this purpose, it is important to not only know the position but also the orientation (direction) of the target object, as in the above-described production robot system.
To estimate the orientation of the target object from a captured image, teaching data is necessary in general, which makes the system learn target object orientations to be used for pattern recognition processing. As the teaching data, for example, orientation-specific target object images are used, which are obtained by capturing the target object in several representative orientations (orientations relative to the image capturing apparatus, which will be referred to as “representative orientations” hereinafter) when viewed from every direction so as to spherically envelope the target object.
To reproduce accurate representative orientations, target object images to be used as the teaching data are often generated in advance using a dedicated teaching data generation apparatus.
However, the teaching data generation apparatus is shared by a plurality of apparatuses from the viewpoint of the space, cost, and the like, and therefore often installed in a place different from the site where the above-described robot system is used. For this reason, it is very difficult to make illumination and the image capturing conditions of the image capturing apparatus and the like match those used in the robot system on site. That is, the degree of coincidence between the teaching data of the target object and image data captured on site decreases although they should have the same orientation, resulting in lower orientation estimation accuracy. In addition, introducing the dedicated teaching data generation apparatus leads to an increase in the cost.
Several methods of generating teaching data using the robot system itself, which is used on site have been proposed, in place of the dedicated teaching data generation apparatus.
For example, in Japanese Patent Laid-Open No. 2005-1022, using a visual sensor for obtaining and outputting 3D data, the 3D data of a target object gripped by the hand is obtained, thereby extracting a feature with hand associated with the shape.
A portion corresponding to the hand is removed from the feature with hand based on the position and orientation of the hand and a hand model created and stored in advance, thereby outputting a feature without hand. This data is registered as an object model. If there is an already registered object model of the same target object, it is updated by overlapping the newly obtained object model.
When obtaining teaching data for orientation estimation by capturing an image of a target object gripped by the hand mechanism while changing its orientation, as in the above-described related art, not only the target object but also (at least part of) the hand mechanism is included in the captured image. If this image is directly used as the learning image, the originally unnecessary image feature of the hand mechanism is learned together. It is therefore difficult to estimate the accurate orientation only from the image of the target object on the tray.
As described above, one of the merits of the method is that the teaching data can be generated under the same image capturing conditions (ambient light and image capturing apparatus) as those of the environment in actual use. However, especially the ambient light does not always guarantee a predetermined condition and may change in a day. At this time, the orientation estimation accuracy lowers.
In Japanese Patent Laid-Open No. 2005-1022 described above, the feature corresponding to the hand portion is removed. However, the visual sensor capable of obtaining 3D data needs to be used. Also required is doing cumbersome calculation of, for example, holding a hand model in advance and converting a feature obtained from obtained 3D data to a hand coordinate system. Moreover, there is no countermeasure taken against the change in the ambient light.
In consideration of one of the above-described problems, the present invention provides a technique of generating image data including only a target object by removing unnecessary portions such as a hand by simple processing from a 2D image obtained by capturing the target object gripped by a hand mechanism while changing its orientation.