1. Field of the Invention
The present invention relates to an inspection apparatus which extracts a characteristic value from inputted measurement data of an inspection target to make a determination of a state of the inspection target based on the extracted characteristic value.
2. Description of the Related Art
A number of rotary machines, in which drive-system components such as a motor, are incorporated are used in automobiles and home electric appliances. For example, in automobiles, the rotary machines are mounted on an engine, a power steering, a power seat, a transmission, and many other parts. In home electric appliances, the rotary machines are mounted in a refrigerator, an air conditioner, a washing machine, and various other products. When the rotary machine is operated, a sound is generated by rotation of the motor.
The sound generated by the rotation of the motor includes a sound, which is inevitably generated by the normal operation and a sound generated by a defect. The abnormal noises associated with the defect include those caused by bearing anomaly, internal abnormal contact, unbalance, and intrusion of foreign body. Specifically, the abnormal noises include those due to lack of gearing, which generated at a frequency of each one turn of the gear, bite of a foreign body, spot flaw, and momentary rubbing between a rotating portion and a stationary portion of the motor during rotation. The human can hear the sound ranging from about 20 Hz to about 20 kHz. The human feels uncomfortable by the sound of about 15 kHz. The sound in which such a specific frequency component is generated is also felt to coincide with abnormal noise. Obviously though, the abnormal noises are not limited to the frequency of about 15 kHz.
Not only is the sound associated with the defect uncomfortable, but also, the sound possibly indicates the lead-in to another failure. Therefore, the presence or absence of the abnormal noises is detected for the purpose of quality assurance for each product. In a manufacturing plant, an examiner usually performs “sensory inspection” by five senses, usually hearing and touch. Specifically, the examiner hears the sound or confirms vibration by touch. The sensory inspection is defined by sensory inspection terminology JIS Z8144.
Skilled performance is required in the sensory inspection with the five senses of the examiner. The result of the sensory inspection heavily depends on individual examiners and varies with time. Furthermore, it is difficult to convert the result of the sensory inspection into data or a numerical value, which results in a difficulty in managing the sensory inspection. In order to solve the problem, an abnormal noise inspection apparatus is used as the inspection apparatus, which inspects the anomaly of the product including drive-system components. The purpose of the abnormal noise inspection apparatus is to conduct stable inspection with quantitative and clear criteria.
In the conventional abnormal noise inspection apparatus, a high-performance discrimination algorithm is produced and improved in order that an over-detection rate is reduced while occurrence of an undetected error rate is eliminated. As used herein, the “undetected error” shall mean that a defective product (abnormal product) is discriminated as an acceptable product (normal product). It is necessary to surely prevent the undetected error, because otherwise the defective product will be shipped. The “over-detection” shall mean that the acceptable product is discriminated as the defective product. In the over-detection, the acceptable product will not be shipped and instead will be scrapped, which means that the acceptable product is wasted and a yield is decreased. Therefore, the number of characteristic values used is increased, and the number of samples necessary for producing a better discrimination rule is increased.
Recently, consumers have become more strict with the quality of the industrial product. In the era of high-mix low-volume production, not only the quality of the product needs to be ensured, but also, a production line must be established as soon as possible. That is, it is not sufficient only to achieve the higher accuracy of the abnormal noise inspection algorithm. There are two needs in a production site in order to ship higher quality products.
First one is to automatize the inspection. Usually, in the inspection for measuring a size or a weight of the product in a production process, a management criterion is determined for each characteristic of the product to manage the quality. For example, a plurality of quality characteristics are extracted from an image or a waveform in the inspection apparatus in which the sensory inspection such as solder appearance inspection of a printed wiring board and the abnormal noise inspection of the automobile engine is automatized. Then, a discrimination model makes a determination in a comprehensive manner.
Second one is vertical start-up. Usually, the mass production line is started up after mass production trial. In the mass production trial, a product is produced to check whether or not any problem exists in the processes by the same production means as the mass production after research and design. In automatically producing a discrimination model for an automatic inspection apparatus, the modeling cannot be performed unless sufficient data is collected. Therefore, the inspection criterion cannot be fixed until the mass production is started. In order to realize the vertical start-up of the production line, it is necessary that the inspection criterion used in the mass production phase is determined in the mass production trial phase to simultaneously start the stable inspection with the start of the mass production.
In the sensory inspection, the discrimination is performed in a comprehensive manner on the quality characteristics such as volume and pitch of the sound, an appearance color, and a shape. Pattern recognition is effectively used in the sensory inspection automation system. In the pattern recognition, a plurality of characteristic values indicating the quality characteristics are extracted from the data obtained by a sensor such as a microphone and a camera, and the discrimination is made by a discrimination function. Generally, in the pattern recognition, it is necessary that a sufficient number of learning samples be prepared to determine the discrimination function.
Next, a product inspection by the pattern recognition will be described.
FIG. 24 shows a procedure of the pattern recognition. The pattern recognition is a technique of determining (discriminating) a group to which the data belongs based on the pattern of the characteristic value extracted from the data. Therefore, in the pattern recognition, it is necessary that the discrimination function on the pattern space be previously automatically generated (learned) from the data that has been already observed or measured.
The pattern recognition technique can be classified into four types according to distribution expression and distribution symmetry.
The distribution expression can be classified into “parametric discrimination” and “non-parametric discrimination”. The distribution is expressed by a statistical parameter in the parametric discrimination while the distribution is not expressed by the statistical parameter in the non-parametric discrimination.
The distribution symmetry can be classified into “two-class discrimination model” and “one-class discrimination model”. In the two-class discrimination model, it is assumed that the distribution symmetry holds. In the one-class discrimination model, it is not assumed that the distribution symmetry holds.
Specifically, during the learning phase, in the parametric discrimination, a parameter is estimated for regulating a shape of a probability density distribution (for example, average and dispersion) followed by data belonging to each group for a plurality of groups (for example, normal and abnormal) formed by the pieces of data that have been already observed. When new data is observed in the discrimination phase, a degree of attribution to each group is determined using the estimated parameter, and the group to which the data belongs is determined. The parametric discrimination is an effective technique only in the case where it can be assumed that the data follows the probability density distribution (for example, normal distribution) whose shape can be regulated by the parameter.
During the learning phase, in the non-parametric discrimination, all the pieces data that have been already observed or part of data contributing to the discrimination are retained in each group. Alternatively, in the non-parametric discrimination, the density distribution is directly determined from the data without using the statistical parameter. When new data is observed during the discrimination phase, the degree of attribution to each group is determined from the retained data, or similarity or a distance to the distribution, and the group to which the new data should belong is determined. The non-parametric discrimination is the effective technique even if it cannot be assumed that the data follows the probability density distribution whose shape can be regulated by the parameter.
On the other hand, during the learning phase, in the two-class discrimination, the discrimination function is learned using the samples of the two classes (for example, acceptable product and defective product) to be discriminated. During the discrimination phase, the degree of attribution of the unknown sample to each class is determined by the discrimination function to compare and evaluate to which class the data is more likely to belong.
During the learning phase, in the one-class discrimination, the density estimation is performed using only the one-class learning sample. During the discrimination phase, the degree of attribution of the unknown sample is determined by the discrimination function based on the density. Then, threshold determination is made in such a manner that the unknown sample is determined to belong to the class when the degree of attribution of the unknown sample is not lower than a predetermined value and the unknown sample is determined not to belong to the class when the degree of attribution of the unknown sample is lower than the predetermined value.
For example, the pattern recognition has the following four categories:
(1) Parametric two-class discrimination: Bayes discrimination and discrimination analysis;
(2) Non-parametric two-class discrimination: nearest neighbor discriminator (NN discriminator) and support vector machine (SVM);
(3) Parametric one-class discrimination: Mahalanobis-Taguchi system (MTS); and
(4) Non-parametric one-class discrimination: histogram method, nearest neighbor estimation, one-class SVM, Parzen window method, RBF (Radial Basis Function) network, kernel density estimation, and boostrap method.
The acceptable product is homogeneous, while the defective product has a wide variety. Therefore, the usual two-class discrimination in which it is assumed that the distribution has the symmetry on the feature space of each class is not suitable for the discrimination between the acceptable product and the defective product. The number of defective product samples which can be collected in the product inspection is extremely small compared with the acceptable product sample. Therefore, the one-class discrimination in which only the acceptable product distribution is considered is effectively used in the discrimination between the acceptable product and the defective product.
It is necessary that the inspection be started simultaneously with the start of the mass production. That is, it is necessary that the discrimination function for discriminating the acceptable product from the defective product be determined from the restricted number of samples obtained before the mass production. Sufficient acceptable product samples, however, are also not obtained before the start of the mass production. In the parametric discrimination in which the statistical estimation is required, the satisfactory performance cannot be ensured with the small number of samples. Therefore, the non-parametric discrimination in which the statistical estimation is not required is effectively used in the case where the discrimination function is determined from the restricted number of samples.
Thus, the non-parametric one-class discrimination is effectively used in the pattern recognition to be applied to the product inspection.
The following techniques are cited herein as examples of the conventional inspection apparatuses.
In the parametric discrimination, it is impossible to perform the learning with the small number of samples, or it is difficult to ensure the discrimination performance with the small number of samples. For example, in MTS, the (accuracy multicollinearity) learning cannot be performed when the number of learning samples is not more than the number of feature points. Even if the number of learning samples is more than the number of feature points, sometimes the discrimination performance cannot be ensured because the small number of samples is not sufficient to assure the accuracy of the statistical estimation. Therefore, in order to ensure the performance, empirically, it is necessary that the number of samples be approximately three times larger than the number of feature point. Hiroshi Tazaki, Kazuto Kasuya, and Hiroshi Nakajima, “Progressive discrimination model update method for automatic inspection,” 32nd Intelligent System Symposium Proceedings, pp. 243-246 (2005) discloses a method in which the performance is ensured by the use of the non-parametric discrimination or by the simultaneous use of the non-parametric discrimination in the case where the number of samples is small.
Nello Cristianini, John Shawe-Taylor, “An Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods,” Cambridge University Press (2000) discloses a method in which, in the one-class SVM, a parameter is adjusted to minimize the number of support vectors by utilizing a nature that a ratio of the support vector becomes an upper limit of an error rate evaluated by a leave-one-out method. However, in the method, the problem that the acceptable area is possibly divided into a plurality of areas is not solved because the shape of the area is not evaluated.
Asa Ben-Hur, David Horn, Hava T. Siegelmann, Vladimir Vapnik, “Support Vector Clustering,” Journal of Machine Learning Research 2, pp. 125-137 (2001) discloses a method in which clustering is performed to the learning samples belonging to the same area by determining whether or not a line segment connecting the learning sample discriminated as the acceptable product deviates from the acceptable area for all combinations of the learning samples. As shown in FIG. 25, the sample belonging to the same cluster can be known by producing a matrix (FIGS. 25B and 25D) in which the presence or absence of the deviation is expressed by zero or one. FIG. 25B shows the matrix in the single acceptable area (FIG. 25A) and FIG. 25D shows the matrix in the two acceptable areas (FIG. 25C).
Basically, a quality characteristic of a product has a variation (caused by a variation of component and material or a fluctuation of manufacturing apparatus) around the center of a target value. Therefore, it is believed that an area (true acceptable area) where the acceptable product is generated forms a single area centering on target value (FIG. 26A).
The discrimination function obtained by the learning from the limited samples actually forms the acceptable area (learned acceptable area) which is different from the true acceptable area. As a difference between the actual acceptable area and the true acceptable area is decreased, the discrimination performance becomes better (FIG. 26B).
Usually, in the non-parametric discrimination, the acceptable area is determined based on density of the learning samples. When a coarse portion exists in the learning samples, the acceptable area is possibly divided into a plurality of areas (FIG. 27A). In the case where the number of the learning samples is small, the learning samples become possibly coarse even in a portion in which the learning samples originally have high density. That is, a risk of largely lowering the discrimination performance occurs.
When the acceptable area is single (FIG. 27B), it is thought that the actual acceptable area is brought closer to the true acceptable area when compared with the case in which the acceptable area is divided. In such cases, it is expected that the discrimination performance is improved. Therefore, after the learning, it is determined whether or not the discrimination function forms the single acceptable area, and the parameter may be adjusted such that the acceptable area becomes single.