1. Field of the Invention
The present invention relates to an apparatus for simultaneously extracting local features of an image such as a character or any other figure and serially discriminating the degree of similarity between the image and a reference image, the apparatus being suitably used in an image recognition apparatus such as an OCR (Optical Character Reader).
2. Description of the Prior Art
A conventional image recognition apparatus for recognizing an image such as a character and any other figure is designed to perform processing utilizing mainly electronic techniques. Such an image recognition apparatus utilizes a bottom-up type technique wherein an input image is recognized on the basis of global feature extraction information obtained by an structural analysis.
A conventional image recognition apparatus will be briefly described. An image (input image) pattern subjected to image recognition and written on an original by printing or the like is focused by an optical lens on a light-receiving surface of an image sensor comprising a CCD or a MOS sensor. A digital signal as image information is output from the image sensor and is binarized by a proper threshold value (if there are multiple threshold values, multi-value conversion different from that described above is performed). The binarized signal is stored in a memory. The binarized image information is subjected to preprocessing for shaping the image, as needed. The preprocessed image information is stored in the above memory or another memory. Preprocessing includes noise reduction processing, normalization processing for positions, sizes, inclinations, and widths, and the like.
A feature extraction required for discriminating an image is performed over the image information stored in the memory. A projection feature extraction method is used as one of techniques for extracting features. These projection features are extracted by a feature-processing section.
In order to extract features of an image on a given axis (e.g., the X-axis), the memory which stores the image information is scanned in a direction (e.g., the Y-axis) having a predetermined relationship with the given axis, and the image information is read out time-serially or parallel-time-serially. The readout image information is transferred to the feature-processing section. Pieces of the transferred image information are sequentially measured by the feature-processing section. Measured values sequentially obtained by such measurements are stored at predetermined positions corresponding to the given axis in the memory or another memory. A curve of an intensity distribution obtained by extracting features on the given axis is calculated on the basis of the stored measured values.
In recognition processing (to be described later) or the like, feature extraction along one axis is not sufficient in order to improve reliability of similarity discrimination if a pattern is a two-dimensional pattern such as an image. For this reason, feature extraction of single image information must be performed on a large number of axes, thereby extracting various types of features. In order to extract features on a large number of axes, one of the following procedures is required:
(1) The above-mentioned feature extraction is repeated in a single feature-processing section; or
(2) A large number of feature-processing sections are arranged and at the same time, pieces of image information read out from the memories are respectively transferred to the feature-processing sections. The above-mentioned feature extraction operations are simultaneously performed in the large number of feature-processing sections.
Class selection (class classification) is performed to discriminate that an input image of interest belongs to which class according to data of a large number of intensity distribution curves concerning the input image. This class classification is achieved by time-serial digital correlation calculations between the data of various types of intensity distribution curves concerning an input image and data of the intensity distribution curves of various types of reference patterns prestored in a dictionary for class classification. Therefore, the class to which the input image belongs can be discriminated by the reference pattern giving a maximum correlation with the input image pattern.
Individual recognition processing is performed for an image group belonging to a selected class on the basis of the similarity recognition results. This recognition processing is performed by individual digital correlation calculations between data of the intensity distribution curve concerning the reference pattern of the image group stored in a recognition dictionary and data of the intensity distribution curve concerning the input image in the same manner as in class classification.
A large number of digital accumulated values constituting the intensity distribution curves are respectively corresponded to vector components, and each intensity distribution curve is dealt as one vector. A total of intensity distribution curves is dealt as a set of vectors. In this case, the set of intensity distribution curves may be dealt as a single vector, and the individual digital accumulated values of each intensity distribution curve are corresponded to vector components constituting the single vector.
In the same manner as described above, each intensity distribution curve of the reference pattern can also be defined in the form of vector.
A special-purpose machinery incorporated in the image recognition apparatus digitally and time-serially calculates correlations between the input image vectors and reference pattern vectors. The special-purpose machinery may be a vector calculator practically used in a conventional parallel pipeline type computer.
In the correlation calculations between the input image vectors and the reference pattern vectors, a distance and an angle between the vectors can be used as factors for evaluating the degree of correlation therebetween. In practice, the distance between the vectors is used as a measure for the degree of deviation, and the cosine of the angle is used as a measure for the degree of similarity.
Variations in input image patterns are present due to a variety of expression formats of the original image, and the input image constitutes a cluster. Positional errors also occur in the input image. For this reason, the reference point of the intensity distribution of the input image does not normally match with that of the reference pattern. Therefore, in vector correlation calculations, an optimal correlation must be found to match the reference point of the input pattern with that of the reference pattern.
Optimal correlation between the input image vector and the reference pattern vector can be obtained by shifting both vectors and repeating the vector correlation calculations according to time-serial digital processing for every shift.
The above-mentioned vector correlation calculation processing allows discrimination of a reference pattern having a higher degree of similarity to the input image, i.e., the most resemble reference pattern.
As described above, expression formats of the original images to be recognized vary to present various patterns of identical images. If an image comprises, for example, characters, they include printed characters having uniform styles of penmanship as well as handwritten characters, thus presenting a variety of styles of character patterns. In particular, the forms of handwritten characters are deformed, and their patterns depend on individual handwriting habits.
In order to recognize handwritten characters subjected to deformation, it is utilized that local features are maintained in handwritten characters, and global features thereof are discriminated by induction on the basis of local features, thus effectively performing a top-down type technique.
The technique utilizing the top-down characteristics is to recognize an image by induction utilizing knowledge concerning an object to be recognized. In this sense, class classification can be effectively performed without matching all patterns of the image with the reference patterns.
The above-mentioned knowledge utilization introduces primitive patterns such as geometric features of characters, in particular, character parts associated with radicals of the Chinese characters, such as a left-hand radical, a right-hand radical, and an embracing radical for information recognition processing. Therefore, recognition efficiency can be improved.
A typical system of top-down characteristic utilization hardware is exemplified by an association memorization system utilizing an information processing algorithm inherent to a human brain. In recent years, extensive studies have been made to develop such a system.
However, in the conventional image recognition apparatus employing the bottom-up type technique described above, processing is performed employing mainly electronic techniques. Processing time is inevitably prolonged as follows:
In order to improve discrimination precision of the degree of similarity, features on a large number of axes must be extracted in feature extraction. However, in procedure (1), when the single feature-processing section is used to repeat feature extraction, the memory which stores the image information is scanned in predetermined directions to sequentially read out the image information from the memory. These informations are transferred to the feature-processing section and are measured as the measured values. The intensity distribution curve must be obtained on the basis of the measured values. Therefore, the above operation must be repeated to prolong the feature processing time, thus degrading efficiency of feature processing.
In procedure (2) wherein a large number of feature-processing sections are arranged and simultaneous processing is performed, the intensity distribution curves are obtained after the image information is transferred and measured. The feature processing time is prolonged, although procedure (2) is not worse than procedure (1). Procedure (2) requires a large number of feature-processing sections, and thus the overall system configuration is undesirably complicated and high cost.
In correlation calculations for discriminating the degree of similarity, processing time is prolonged in the same manner as in feature extraction. More specifically, the objects to be calculated are a large number of digital vector components. Discrimination of the degree of similarity between the input image and the reference pattern must be performed by repeating correlation calculations of a large number of vectors according to time-serial digital processing, in association with necessity for finding an optimal correlation.
In order to shorten the processing time, the above-mentioned special-purpose machinery for vector calculation is used. However, this special-purpose machinery depends on time-serial digital processing and does not essentially solve the problem of long processing time. In addition, a vector processor is built into such a special-purpose machinery. Therefore, the entire system consequently becomes highly costly.
In the conventional image recognition apparatus, the structural analysis by global feature extraction is primarily used. It is therefore difficult to sufficiently recognize an input image with many deformations such as handwritten characters.
Furthermore, although the top-down type image recognition apparatus aims at an increase in a recognition rate of handwritten characters with such deformations, a satisfactory apparatus is not yet available in practical applications. Therefore, the above-mentioned problems associated with recognition of images such as handwritten characters are left unsolved.
Development of special hardware is an indispensable requirement to realize a practical apparatus of this type. Many problems such as an economical problem as well as a problem associated with processing time are presented. In order to provide a practical model, the number of calculations must be reduced and approximation processing must be performed instead. However, in this case, analysis precision for handwritten characters with many deformations is undesirably degraded.