As one of methods to discriminating the object in the image, there is a method based on statistical pattern recognition technology. This method collects object texture data in advance, carries out learning processing and carries out discrimination processing based on a parameter calculated by the learning processing. As a result, it is discriminated whether the texture in the image to be discriminated belongs to the object. The method is known to be able to get the high discrimination precision.
On the other hand, in order to get the high discrimination precision by the statistical method, it is required to prepare the data (hereinafter, it is described as learning data) needed for carrying out the learning processing. The learning data may be regarded as a set of the image template group which is obtained by cutting a target object that becomes the discrimination and those class information on individual objects (it is called a label). In order to create this learning data, it is necessary to prepare the image which an object is projected on, and prepare more information such as the precise position, size and rotation angle of the target object on the image. The information such as the position, the size and the rotation angle of this target object is usually created by human handwork with watching the image.
It is a problem of the method based on the statistical pattern recognition technology that the work volume for creating this learning data becomes enormous. It is not found before the actual learning and the discrimination how to set the decision reference such as the position, size and rotation angle of the target object when the image template is cut out in order to get the good discrimination precision. For example, a case that target object is a human body is considered. In order to discriminate whether the target object is a human body, it is not found before the actual prompt which is better the utilization of a whole body image as an image template or the utilization of the image of just a body instead of the body portion whose position moving is large such as hand and foot.
Even if the position, size and rotation angle of the target object are not correct, the work volume which creates the learning data can be reduced substantially when it can be learned automatically and appropriately. One of such learning schemes is disclosed in the non patent literature 1.
A method disclosed in the non patent literature 1 is called Multiple Instance Learning Boosting. In this method, a large number of image templates obtained by perturbing the position, size and rotation angle concerning to a certain target are prepared. Learning processing is performed using the set (hereinafter, it is described as a bag) of these image templates. The method described in the non patent literature 1 learns by the evaluation measure based on “the probability that no smaller than one among a large number of image templates in the bag is an image template of the object based on the correct position, size and rotation angle”. Even if the image template based on the position, size and rotation angle that are not correct is included in the set of the image templates, the method described in the non patent literature 1 automatically finds and learns the common characteristics among the bags in the process of the learning processing. In other words, it can be understood that the method described in the non patent literature 1 performs the learning processing with choosing the characteristics that the position, size and rotation angle of the object arranged automatically.
Various learning methods exist in a statistical pattern recognition technical field. There is a learning method belonging to the kind that is called the ensemble type as one kind of those. In this method, the final discriminated result is obtained by combining a plurality of discriminators called a weak discriminator. In this method, a series of discrimination processing composed of the combination of the discrimination performed with the weak discriminators called weak discriminators and the final discrimination is assumed to be performed by single discriminator. One discriminator which performs a series of its discrimination processing is called strong discriminator. The characteristic of this method is the point that the high discrimination precision can be obtained from the view point of the strong discriminator even if the discrimination precision of each weak discriminator is not always high. A method which has improved the ensemble type learning method called “boosting” is adopted in the non patent literature 1.
On the other hand, the method described in a patent literature 1 is one kind of statistical pattern recognition methods called learning vector quantization (LVQ). In order to understand the present invention easily, the patent literature 1 is described as related art of the present invention. Although the method itself of the LVQ is not the ensemble type learning method, a concept close to the ensemble learning is dealt with in this patent literature 1.
The method described in the patent literature 1 selects sequentially and adds the effective dimension for the discrimination concerning to the characteristics vector of the pattern. For this reason, the method described in the patent literature 1 performs the discrimination by a low dimension vector at first and then performs the discrimination by the high dimension vector whose dimension has been added. In case of this method, if the discrimination by the added characteristics dimension is regarded as the weak discriminator, it can be regarded as a kind of the ensemble type learning methods because it can be assumed to compose a strong discriminator by the combination of the weak discriminator groups.
In addition, as the technology in relation to the present invention, the learning methods using a sample image are disclosed in a patent literature 2 and a patent literature 3.