The present invention relates to an attribute decision method for deciding what type a pattern contained in an input image belongs to, by using a signal processor comprising a signal input device such as TV camera and a computer.
In recent years, it has become one of important issues in the test processes of industrial products or the like to implement accurate type classification, failure classification, and the like of the products. These processes are under an intense desire that they be automated, and image processing apparatuses comprising an image input device, such as a TV camera, and a computer are widely used. For example, a character recognition device using a TV camera that automatically identifies serial numbers or the like inscribed on products can be referred to as one of them.
Below described is an example of the conventional character recognition device using a TV camera. FIG. 15 outlines the construction of the device. Illuminating light is applied from a light source 15 to a surface of an object 11 via a half mirror 14, while an image of the surface of the object 11 is picked up by a TV camera 12 via the half mirror 14. This TV camera 12, provided with a CCD sensor, is able to obtain a gray level signal for each pixel, which signal is processed in a recognition processing unit 13 in digitized form. It is needless to say that the recognition processing unit 13 is equipped with a memory unit for storing image data and storage for containing programs with which the character recognition process is executed.
FIG. 16 shows the construction of the recognition processing unit. An image signal (a) outputted by the TV camera 12 is inputted to an A/D (Analog-to-Digital) converter (conversion circuit) 21, and a digital signal (b) outputted in digitized form by the A/D converter 21 is stored in an image memory 22. The image memory 22 outputs image data (c), which is then converted into a binary image (d) by a binarization circuit 23. The binarization circuit 23 performs such a binarization that character portions in the image data 27 are made into "black" and background portions are made into "white", as shown in FIG. 17. The binary image (d) is inputted to a character cut-out circuit 24, where a cut-out image (e) is outputted. The character cut-out circuit 24 detects a circumscribing rectangle 28 for each one character, as shown in FIG. 18, to separate a character train into individual characters. For this cutting-out of characters, it is often the case that characters are separated off based on projection data projected on the horizontal and vertical axes. The cut-out image (e) with individual characters separated off is inputted to a normalization circuit 25, where mesh pattern data (f) is outputted. In the normalization circuit 25, the pixels present in the circumscribing rectangle of each character are normalization-transformed into a pattern of an appropriate mesh size. The transformation from a plurality of pixels into a correspondent mesh is performed by using their average, maximum, minimum, median, most frequent value, and the like. FIG. 19 illustrates how an image 29 resulting from the character-by-character separation is transformed into a mesh pattern 30 of a lateral 5.times.longitudinal 9 mesh size. The mesh pattern data (f) is inputted to a character decision circuit 26, where a character decision result (g) is outputted. As the characters, for industrial use, most often used are alphanumeric characters of "0" to "9" and "A" to "Z" and a hyphen of "-", totaling 37 characters. The character decision circuit 26 outputs, as a result of decision, which character the mesh pattern data is closest to. FIG. 20 illustrates standard pattern data of the aforementioned 37 characters.
Neural networks (hereinafter, represented as NNs) are often used as the character decision circuit 26. Whereas the NNs are available in various types of organizations (reference literature: Iinuma (Ed.), "Neuro Computers", Gijutsu Hyoronsha, September 1990), the organization of a perceptron type NN, which is commonly applied in actual cases, is shown in FIG. 21. This NN comprises three layers of an input layer, an intermediate layer, and an output layer, where each layer is composed of a multiplicity of non-linear elements called neurons. In the input layer, there exist 45 (lateral 5.times.longitudinal 9) neurons which can assume a value of "0" showing a background portion and another of "1" showing a character portion. In the output layer, there exist 37 neurons corresponding to the alphanumeric characters "0" through "9", "A" through "Z", and the hyphen of "-", which are characters to be decided. In the present example, the connection between the input layer and the intermediate layer, and that between the intermediate layer and the output layer are implemented by connections between their neurons, where a weight (.omega.) is defined on a connection. Each neuron sums up products of an output (y) of a neuron connected thereto and a weight (.omega.) defined on the connection with respect to all the connections, and subjects the summing-up result to nonlinear function processing, thus yielding an output value of 0.0 to 1.0. In this process, the NN can be provided with various characteristics depending on the value of the weight (.omega.). The weight is determined by giving actual data to see whether or not an expected output is produced by neurons of the output layer, and then repeating correction of the weight by the error of the output result over and over. The Backpropagation is often used as the method for this correction (D. E. Rumelhart, et al.: Learning Representations by Back-Propagating Errors, Nature, No. 323, pp. 533-536 (1986)). That is, standard pattern data is given to the input layer and, by assuming such an expected output of the output layer that neurons corresponding to the character are made 1.0 while the other neurons are made 0.0, the correction of weight (.omega.) by the degree of the resulting error is repeated many times for the determination of the weight. In the example, the correction process is repeated until the error between the values of all the neurons of the output layer and the expected output value becomes 0.1 or less. In a decision process rendered on the NN that is over the correction process, as shown in FIG. 22, when an "A" is inputted to the input layer, the output of the neuron corresponding to the "A" in the output layer become large as compared with the other neurons. Ideally, the output value of the neuron corresponding to the "A" become close to 1.0 while the output values of the other neurons become close to 0.0. In the example, the character decision conditions are set as follows:
1) If an output value of a neuron in the output layer become 0.7 or more which is experimentally determined, its corresponding character is taken as a decided character (decision condition (1));
2) If the first largest value is 0.4 or more which is experimentally determined, and the difference between the first largest value and the second largest value of output values of the neurons in the output layer is 0.3 or more, the character corresponding to the first largest value is taken as a decided character (decision condition (2)); and
3) If no such value is outputted from any neuron in the output layer, it is taken as "?", undecidable.
However, the above conventional method using an NN has had a problem that the attribute decision (character decision) could not be accomplished at the demanded reliability. This is explained below based on an actual example.
The example was carried out with the standard pattern data as shown in FIG. 20 by performing a learning on a network comprising an input layer of 45 neurons (binary (0, 1) pattern of lateral 5.times.longitudinal 9), an intermediate layer of 45 neurons, and an output layer of 37 neurons (alphanumeric "0" to "9" and "A" to "Z", and hyphen "-").
FIGS. 23 to 26 show character patterns used for the test of character decision, including each 30 patterns of four characters, "2", "9", "A", and "F". In the figures, a black "1" is denoted by a ".circle-solid." and a white "0" is denoted by a ".sub.-- ".
Referring to typical results as to the character "2", in the data (1) of FIG. 23, the first largest output of the neurons in the output layer occurs to "2", the value being 0.86, so that the decision condition (1) is satisfied. In the data (2) of FIG. 23, the first largest output of the neurons in the output layer occurs to "2", the value being 0.63, and the second largest output of the neurons in the output layer occurs to "Z", the value being 0.12, so that their difference is 0.51 and therefore the decision condition (2) is satisfied. However, in the data (4) of FIG. 23, the first largest output of the neurons in the output layer occurs to "2", the value being is 0.34, and the second largest output of the neurons in the output layer occurs to "S", the value being 0.15 so that their difference is 0.19 and therefore the result is undecidable. Similarly, in the data (9) of FIG. 23, the first largest output of the neurons in the output layer occurs to "2", the value being 0.58, and the second largest output of the neurons in the output layer occurs to "C", the value being 0.48, so that their difference is 0.10 and therefore the result is undecidable.
Referring to typical results as to the character "9", in the data (1) of FIG. 24, the first largest output of the neurons in the output layer occurs to "9", the value being 0.90, so that the decision condition (1) is satisfied. In the date (3) of FIG. 24, the first largest output of the neurons in the output layer occurs to "9", the value being 0.37, and the second largest output of the neurons in the output layer occurs to "5", the value being 0.07, so that their difference is 0.30 and therefore the decision condition (2) is satisfied. However, in the data (2) of FIG. 24, the first largest output of the neurons in the output layer occurs to "9", the value being 0.27, and the second largest output of the neurons in the output layer occurs to "3", the value being 0.07, so that their difference is 0.20 and therefore the result is undecidable. Similarly, in the data (8) of FIG. 24, the first largest output of the neurons in the output layer occurs to "9", the value being 0.38, and the second largest output of the neurons in the output layer occurs to "S", the value being 0.18, so that their difference is 0.20 and therefore the result is undecidable.
Referring to typical results as to the character "A", in the data (1) of FIG. 25, the first largest output of the neurons in the output layer occurs to "A", the value being 0.91, so that the decision condition (1) is satisfied. In the data (3) of FIG. 25, the first largest output of neurons in the output layer occurs to "A", the value being 0.66, and the second largest output of the neurons in the output layer occurs to "4", the value being 0.11, so that their difference is 0.55 and therefore the decision condition (2) is satisfied. However, in the data (22) of FIG. 25, the first largest output of the neurons in the output layer occurs to "A", the value being 0.22, and the second largest output of the neurons in the output layer occurs to "M", the value being 0.08, so that their difference is 0.14 and therefore the result is undecidable. Similarly, in the data (26) of FIG. 25, the first largest output of the neurons in the output layer occurs to "Q", the value being 0.52, and the second largest output of the neurons in the output layer occurs to "A", the value being 0.38, so that their difference is 0.14 and therefore the result is undecidable.
Referring to typical results as to the character "F", in the data (1) of FIG. 26, the first largest output of the neurons in the output layer occurs to "F", the value being 0.91, so that the decision condition (1) is satisfied. In the data (3) of FIG. 26, the first largest output of the neurons in the output layer occurs to "F", the value being 0.65, and the second largest output of the neurons in the output layer occurs to "P", the value being 0.27, so that their difference is 0.38 and therefore the decision condition (2) is satisfied. However, in the data (5) of FIG. 26, the first largest output of the neurons in the output layer occurs to "K", the value being 0.12, and the second largest output of the neurons in the output layer occurs to "F", the value being 0.09, so that their difference is 0.03 and therefore the result is undecidable. Similarly, in the data (6) of FIG. 26, the first largest output of the neurons in the output layer occurs to "K", the value being 0.11, and the second largest output of the neurons in the output layer occurs to "F", the value being 0.09, so that their difference is 0.02 and therefore the result is undecidable.
In the above example, although those departing far from the standard pattern data are included in the example characters, they are discriminatable for man one way or another. The results of the example upon all of the 120 characters were 82/120 (68%) of correct responses, 5/120 (4%) of erroneous responses, and 33/120 (28%) of undecidable responses. As seen from these results, there exists a problem in the large portions of erroneous and undecidable responses.
Also, even if efforts are made to change the organization with a view to improvement in the ability of the NN, it would encounter a problem that the internal structure is provided in a black box fashion so that an optimum method is difficult to find, because of the NN's self-organization in the way of determining the weight (.omega.) by giving actual data and expected outputs corresponding thereto.