The present invention relates to the recognition of objects in images. More specifically, the embodiments relates to detecting the number, position, and orientation of each recognition target object when a plurality of recognition target objects are present in a search target image.
In recent years, object recognition using rotation- and scale-invariant local image features known as keypoints has reached the practical stage. See Lowe, “Distinctive Image Features from Scale-Invariant Keypoints” 2004, hereinafter referred to as Lowe, and Rublee et al., “ORB: An Efficient Alternative to SIFT or SURF” 2011, hereinafter referred to as Rublee et al.
In the technique described in Lowe, known as SIFT, Gaussian filters with different spatial scales are used on images. Differences in the output from filters with adjacent scales are extracted, and image sets known as “Difference of Gaussians” (DoG) are obtained. Coordinates at which the absolute values in a DoG image are at their maximum in both the spatial direction and scale direction are called keypoints, and a plurality of keypoints are usually detected in an image with shading patterns. The orientation of the keypoints is determined from the density gradient of the pixels surrounding the keypoints, and the maximum scale of the DoG is used as the keypoint scale. The pixels surrounding keypoints are divided into 16 blocks, and a shading histogram of the pixels in each block is extracted for use as a feature value of the keypoints.
In SIFT, feature values are expressed as 128-dimensional vectors including a real number element. In the technique described in Rublee et al., known as oFAST, corner portions of shading patterns of pixels are used as keypoints. As in the case of SIFT, oFAST uses both scale and direction. FIGS. 1A and 1B are diagrams showing an example of keypoint detection using the oFAST method of the prior art. In this detection example, the image in FIG. 1A is a dictionary image, and the patterns in the dictionary image are the recognition target objects. The detected keypoints are shown in FIG. 1B. The circle (o) containing cross-hatching are the detected keypoints. The techniques described in both Lowe and Rublee et al. are able to determine with great accuracy whether or not the recognition target objects in a dictionary image are search target images.
However, in situations such as unrecognition of the store products, there may be a plurality of objects present in the search target image. In such situations, the existing techniques can determine whether or not recognition target objects are present. However, the number of recognition target objects, and the position and orientation of each object in the search target image cannot be determined.