In a large number of conventional software applications, high speed calculations are employed by which, in an N dimensional feature space (N is a natural number) that is populated by multiple sampling points, the sampling point is acquired that is the nearest to a point at which data (for N dimensional features) are input. The high speed calculations have been studied extensively, and one application of such calculations is for pattern recognition. When such calculations are employed for pattern recognition, sampling points are category prototype groups that are to be recognized, and an input point is a sampling point that is to be identified. When such employment involves the use of a character recognition apparatus, such as a so-called OCR (optical character reading) apparatus, a value for each dimension in the N dimensional feature space corresponds to a feature element value that is extracted from a character pattern. According to the easiest recognition method, a category to which sampling data (a neighbor) that is nearest to an input point is regarded as a recognition result.
For a small number of dimensions N (only several dimensions), there are some fast theoretical detection methods available. When the value of N becomes greater (e.g., several tens of dimensions or more), however, the number of calculations that is required to acquire distances between an input point and multiple sampling points drastically increases, and there are no theoretical methods that can accurately acquire a nearest point at high speed. Therefore, for an actual application, a trade-off between the speed that can be attained and the minimum reduction of a recognition rate is more important than a guarantee that the nearest point can be acquired.
Many detection methods have been devised for detecting a nearest neighbor point at high speed and with a high rate of success, within a range wherein the reduction in a recognition rate can be kept as small as possible. With these conventional detection methods, before or during the calculation of distances, sampling points for which there is very little possibility that they will be the nearest are excluded (i.e., screening is performed) in order to reduce the number of calculations and to provide high speed calculation. For example, for the employment mainly of features of 200 dimensions in the OCR apparatus, with one method screening is performed with a feature that is obtained by compressing 200 dimensions to 20 dimensions, and with another method screening is performed in advance with another, but easier, feature. Further, in "A Simple, Fast Recognition Method For Hierarchial Pattern Matching", (33rd National Conference Of Information Processing Academy, p. 1643 (1986); Sone, Kato and Takahashi) a method is disclosed by which, during the distance calculation for 200 dimensions, sampling points are compared with simple threshold values that are to be discarded, and screening is performed sequentially for those sampling points for which there is little possibility that they will be the nearest. The above described methods can be combined for effective use.
When using the above conventional nearest neighbor detection methods, a specific threshold value or a simplified feature quantity is employed to perform screening. If processing speed is sufficiently high, a sampling point that is a nearest neighbor will also be screened. If the screening requirements are relaxed to improve the detection rate for a nearest neighbor, an adequately high processing speed cannot be attained. In other words, the conventional technique cannot provide a satisfactory processing efficiency factor that is determined by a trade-off between the speed that can be attained and a minimum reduction at a recognition rate.
To overcome the above described shortcoming, it is one object of the present invention to provide a nearest neighbor, fast detection method by which the threshold value, for screening, is dynamically changed during the distance calculation in order to effect a dramatic increase in the processing efficiency, and that performs fast processing at a high recognition rate.