1. Field of the Invention
The present invention relates to a technique of handling a feature amount in an image.
2. Description of the Related Art
These days, there is a growing demand for a robot which performs work such as assembly in a factory or the like. When the robot handles a work target object whose position and orientation are not always constant, a means for measuring the position and orientation of the work target object is required. As this means, a visual sensor is generally used.
To perform work such as more complicated assembly by a robot, parts to be assembled and the like need to be recognized using a visual sensor. It has long been studied to collate shape information such as CAD data of a part with 2D or 3D information obtained by the visual sensor or the like, and recognize the type, position, and orientation of the part. Also, studies have been extensively made for a recognition method of causing a computer to learn a feature amount extracted from a target object image obtained by an image sensing means, and recognizing the type of object captured in the input image.
At present, various kinds of products are sometimes assembled on the same production line in a factory to cope with a change from the mass production era to the multi-product production era. In each assembly process, similar parts or falsely similar parts need to be assembled. To automate such an assembly process, a recognition method capable of discriminating similar parts or falsely similar parts is necessary.
To meet this demand, studies using active vision have been made. The active vision assumes the vision of robots which discriminate and convey articles in a factory or the like. The active vision can control and change the position and orientation of a visual sensor such as a camera with respect to a target object.
In non-patent reference 1 (Noboru Nishikawa, Masaki Onishi, Takuya Matsumoto, Masao Izumi and Kunio Fukunaga, “Object Recognition Based on Camera Control”, T.IEE Japan, Vol. 118-C, No. 2, 1998), the result of recognizing which model object coincides with a recognition target object in an input image is expressed by the degree of recognition-ambiguity given by the sum of basic probabilities at each of which the recognition target object is each model. The degree of recognition-ambiguity is used as the function of the position and orientation of a camera. The camera is moved using a steepest descent method to minimize the degree of recognition-ambiguity.
In non-patent reference 2 (Mitsuru Jindai, Hirokazu Osaki, Satoru Shibata and Akira Shimizu, “A Recognition Method Combining CAD Information with Image Processing Using a Movable Camera”, Transactions of the Japan Society of Mechanical Engineers, Vol. 66, No. 650), images in multiple directions are acquired using a movable camera and recognized. In discrimination of similar objects, the difference between two pieces of CAD information is obtained to estimate a different part between them. The position and direction of the camera are determined to easily discriminate the different part. An input image and CAD image are compared at the position in the direction, discriminating similar objects.
In non-patent reference 3 (H. Borotschnig, “Appearance-based active object recognition”, Image and Vision Computing 18 (2000), pp. 715-727), an appearance change caused by viewpoint variations and illumination variations of a target object is learned as a manifold in an eigenspace. In recognition, the recognition result is expressed by the probability distribution, and optimized in the manifold, discriminating the object and estimating its position and orientation.
In patent reference 1 (Japanese Patent Laid-Open No. 5-288531), a 3D object is recognized from 2D image data acquired by an image sensing device. At this time, the recognizing result of a neural network is input to a determination unit. When the degree of neuron excitement is smaller than a preset level, the position of the image sensing device is changed. Then, 2D image data of the 3D object is acquired from a different position. The image sensing position change process, recognition process, and determination process are repeated until the degree of neuron excitement becomes larger than the preset level.
In patent reference 2 (Japanese Patent No. 3154501), an object and viewing direction from which sensed object image information is obtained are determined based on a comparison in shape and color between the sensed object image information, and knowledge information of the 3D shape, appearance shape, and object surface color of a known object. If the object shape is discriminated from the determination result, the process ends. If the object shape is not discriminated because of a shortage of image information in determination, an image sensing position for acquiring short image information is obtained. The image sensing device is moved to this position to sense an image, complementing the short image information.
In patent reference 3 (Japanese Patent Laid-Open No. 2000-285198), an image sensing condition suited to the recognition operation of a recognition apparatus is calculated from distance information between a recognition target and an image sensing device that is input from the image sensing device. Based on the calculated condition, the image sensing operation of the image sensing device is controlled. In an embodiment, the angle of view of a camera is calculated from the distance between the license number of a license plate and the lens, the focal length, the license number size, and the license number size on the image sensor so that the size of the license number of the license plate becomes almost equal to that of a license number stored in an image.
Among studies on a recognition technique of extracting a given feature amount from an image, mapping it in a feature space defined by the feature vector, and learning a discriminant function, many studies have been made to discriminate similar target objects or falsely similar target objects from each other.
In patent reference 4 (Japanese Patent No. 3841481), feature vectors close to respective feature vectors are selected in a feature vector space. A subspace vector space which maximizes local interspersion when the selected feature vectors are orthogonally projected is output. By projecting the selected feature vectors in the generated subspace and discriminating them, even indistinguishable data can be discriminated at higher precision than the conventional one.
Patent reference 5 (Japanese Patent N 3945971) discloses the following arrangement for similar category discrimination. More specifically, the variance is obtained by projecting learning data of the category of interest in the axial direction of a subspace in which the weighted average of the covariance matrix of a category similar to the category of interest, and the covariance matrix of the category of interest are mixed. A quadratic discriminant function is obtained using the variance. Based on this function, erroneous recognition of a similar category is prevented.
In patent reference 6 (Japanese Patent Laid-Open No. 2003-345830), the number of images saved in learning is suppressed while maintaining the search (recognition) precision. For this purpose, a range occupied by a search (recognition) target in a feature amount space is obtained using feature amounts extracted from an image containing the search (recognition) target. Then, distances to learned images containing no search (recognition) target in the feature space are calculated. An image whose distance falls within the range of preset values is registered as a similar image. Only the search (recognition) target region (space) and similar image region (space) in the feature space are saved.
In conventional studies on the recognition technique of extracting a given feature amount from an image, mapping it in a feature space defined by the feature amount vector, and learning a discriminant function, it is difficult to discriminate similar target objects or falsely similar target objects from each other at high precision.
When features are extracted from an image which captures similar target objects or falsely similar target objects under a given image sensing condition, and are mapped in the feature space, they are close to each other in the feature space. If the distance between the features is smaller than a measurement error or the like, the features cannot be discriminated from each other. In this case, it is necessary to change the image sensing condition, acquire a target object image again, or perform secondary discrimination or the like.
In non-patent reference 1, the degree of recognition-ambiguity is expressed by the function using the position and orientation of a camera as variables. Optimization is done using a steepest descent method so as to decrease the degree of recognition-ambiguity. However, an optimum image sensing device position is not uniquely determined for a recognition target object. Further, image sensing condition parameters to be changed are limited to the position and orientation of the image sensing device.
In non-patent reference 2, there are many limited conditions when performing recognition using 3D CAD data and a movable camera while changing the viewpoint of the camera. This method is premised on that only one target object is placed in the recognition area and is irradiated with uniform light vertically from above the target object, and the recognition area is painted in black so that the luminance value becomes smaller than that of the target object in processing. This method is basically based on the geometrical information and luminance value of CAD data, and does not use a feature amount obtained from an image. Also, an optimum camera position and orientation are not uniquely determined.
Non-patent reference 3 refers to only estimation of the orientation of a single object, and does not mention discrimination/distinction of similar target objects or falsely similar target objects. This method does not uniquely determine an image sensing condition for distinguishing such target objects.
In patent reference 1, the position of an image sensing device is changed to repeat recognition until the degree of neuron excitement becomes higher than a preset level. An optimum image sensing device position is not uniquely determined for a recognition target object. Further, a position to be changed is not determined by recognition or a learning result. Similar to non-patent reference 1, image sensing condition parameters to be changed are limited to the position and orientation of the image sensing device.
In patent reference 2, as well as patent reference 1, an image sensing condition parameter to be changed is limited to the position of the image sensing device. In an embodiment, the method copes with occlusion of a target object, and does not execute a task of changing the position of the image sensing device in order to discriminate similar target objects or falsely similar target objects.
In patent references 4 and 5, when similar target objects are captured from a given viewpoint, they look the same. If feature amounts or the like extracted from the image are almost equal, it is difficult to discriminate them. In this case, these methods do not execute a task of changing the image sensing condition for acquiring a target object image.
In patent reference 6, the distance is calculated in the feature space, and a similar image region is learned in the feature space in accordance with the distance. However, similar to patent references 4 and 5, this method does not solve a problem that when similar target objects are captured from a given viewpoint, they look the same, and a search (recognition) image and similar image cannot be distinguished in discrimination.