The processing for extracting specific parameters from an input pattern is indeed general processing in pattern information processing, and for example, includes processing for extracting positions of eyes and ears from an image of a human face and processing for extracting a position of a number plate from an image of a vehicle.
Conventionally, the most popular method of such processing is called a matched filter method as summarized below, and an extremely large number of applications have been proposed. As an example, a method of extracting facial features will be described below with reference to FIG. 1.
As illustrated in the operation flow chart in FIG. 1, templates of eye and ear regions are stored in template database 1601 in advance. As illustrated in FIG. 2, a plurality of eye templates 1701 is stored in template database 1601.
When an input image is provided from a camera (S81), a single template 1701 is obtained from template database 1601 (S82). Next, as illustrated in FIG. 3, input image 2001 is searched using search window 2002, and the similarity degree between an image within search window 2002 and template 1701 is obtained (S83). The computation of the similarity degree usually uses the normalized correlation between the image within search window 2002 and template 1701.
It is judged whether the above processing is executed on the whole input image 2001 (S84), input image 2001 is scanned using search window 2002 until the scanning is performed on the whole input image 2001 (S85), and the processing S83 is executed.
Then, it is judged whether the above search is performed with respect to all the templates 1701 contained in template database 1601 (S86). When the processing is not executed with respect to all the templates 1701, a target template 1701 is changed (S87), the processing flow shifts to S83, and the processing of S83 to S85 is executed on all the templates.
Based on similarity degrees between the image within search window 2002 and templates 1701 obtained in the processing of S83 to S87, a position of a local area (search window 2002 region) that is the most similar to template 1701 is found from input image 2001, and the position corresponding to the local area is output (S88).
An example of methods based on the aforementioned method is described in detail in R. Brunelli, T. Poggio, “Face recognition: Features Versus template”, IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI-8, pages 34 to 43, 1993.
A difficulty in the aforementioned conventional method is processing cost in computer. Assuming that a size of an input image in which search is performed is S, template size is T, and the normalized correlation is used as a criterion of similarity degree, when the multiplication is unit computation, a time computation amount requires the number of computations of 2×T×S. For example, in extracting coordinates of a feature point of a typical face image, under the assumption that T=50×20=1000 (pel) and S=150×160=22500 (pel), it is required to multiply 2×1000×22500=45×1000,000=4500 millions times. Such a large number of multiplications require enormous computation cost; even the computation speed of a computer is improved.
Templates used in the processing usually use typical data such as an average of all learning data, which causes many cases that the matching does not work well depending on environments. Therefore, there is a method of performing the similarity degree computation using a plurality of templates prepared corresponding to the input pattern. However, such a method increases the number of processing corresponding to the number of templates, and therefore, imposes loads on a computer also in term of processing cost.