The present invention relates to an image inspection/recognition method for making a visual inspection of objects for defects, for instance, or visually sorting or recognizing them through the use of their images. The invention also pertains to an apparatus for generating reference data for use in the visual inspection or recognition of objects.
A description will be given first of typical inspection methods heretofore employed.
Referring to FIG. 1, a conventional inspection method will be described which utilizes binarization processing. Now, assume that a projected image of an object under inspection is such as shown in FIG. 1A and consider its visual inspection to determine the severity of a chip or dilation of a central rectangular region 11a of the image.
In FIG. 1A, the intensity of the image is shown to increase in the order of regions 11b-11c-11d-11a. An intensity variation on a broken line 12 passing through the centers of the regions 11a and 11d in the image is such as indicated by the curve 15 in FIG. 1B, which is obtained by adding together a step-like variation indicated by the curve 13 and a variation indicated by the curve 14; the curve 15 is binarized (patterned) by dividing individual intensity values into those below and those above a threshold value 16. The threshold value 16 for this binarization is, as shown in FIG. 1C, set to an intensity value corresponding to the minimum number of pixels between intensity histogram curves 17a and 17d in the regions 11a and 11d. Usually, such an ideal step-like pattern as indicated by the curve 13 is not provided; in practice, such a pattern as indicated by the curve 15 is provided which contains intensity variations caused by a spot on the surface of the object or change in the condition of its illumination, and the pattern is subjected to binarization processing through the use of the threshold value 16. The binarization processing is carried out over the entire area of the noted rectangular region 11a and the resulting intensity distribution pattern in binary terms is checked to see if the object under inspection is good. It is desirable to set the threshold value 16 in anticipation of such an intensity variation pattern 14, but this involves much difficulty in many cases. If the threshold value 16 is set to a wrong value, variation components 19 such as the diagonally shaded areas in FIG. 1D are generated under the direct influence of a spot or changes in the condition of illumination, and the configuration of the binarized pattern of the rectangular region 11a does not match its actual configuration because of such variation components. Thus, it is impossible to detect and judge defects of the object such as chips and dilations thereof.
Turning next to FIG. 2, a description will be given of another conventional visual inspection method which employs an intensity normalized correlation scheme. The intensity distribution M.sub.kl (where k=1, . . . , K and l=1, . . . , L) is calculated over the entire area of a standard window 22 in a standard image 21 shown in FIG. 2A, which is the object to be inspected, and the distribution is registered as intensity patterns, as depicted in FIG. 2B. A region 24 in an inspection image 23 shown in FIG. 2C, which corresponds to the standard window 22, is scanned from the top left-handed corner to the bottom right-handed corner of the region 24 and the following calculation is made for each scanning point (i, j) to compute a correlation factor .gamma..sub.ij (where -1.ltoreq..gamma..sub.ij .ltoreq.1). ##EQU1## where M.sub.0 is the mean intensity value in the standard window 22 and T.sub.0 is the mean intensity value in the region 24 in the inspection image.
The maximum value .gamma..sub.max is detected from such correlation factors .gamma..sub.ij defined for individual scanning points; the closer to unity the maximum value .gamma..sub.max is, the more the object under inspection is judged as being good. The correlation factor is immune from varying with an offset variation of the intensity but undergoes variations corresponding to a spot, change in the condition of illumination, or similar uneven intensity change, as indicated by the diagonally shaded areas 27 surrounded by a standard intensity pattern 25 and an inspected intensity pattern 26 in FIG. 2D; that is, the correlation factor greatly changes even for a good object, resulting in the object under inspection being misjudged.
To implement the above-described image or visual inspection methods, it is important what image features are to be extracted (an image feature extraction method), which image feature is to be used (an image feature selection method) and how the selected image feature is to be used (a matching method).
Various image feature extraction methods are disclosed in Japanese Pat. Laid-Open No. 226785/88 entitled "Image Feature Extracting Apparatus." According to this prior art literature, the object image is processed to extract therefrom a region feature image which reflects the properties of its noted region, such as the intensity statistics, complexity and extent of the image, and a contour feature image which reflects the properties of its contour, such as edges and their inflectivity. The above-mentioned noted region in the region feature image is a localized region to be processed by localized filtering, not a region obtained by segmentation.
The intensity statistics is a feature value of each pixel that is defined by the variance or dispersion in the intensity value within a certain extracting field radius from the pixel. The complexity is a feature value of each pixel that is defined by the mean value of the total sum of edges accumulated on respective radial scanning lines from the pixel within a certain extracting field radius. In this instance, wt(r)=a/r.sub.k (where a and k are constants and r is the address of each scanning line) to increase the output sharpness and, with a view to calculating the true edge density, a simple accumulated number of edges is not used but the number of times each scanning line crosses edges is accumulated. The extent is a quantity corresponding to the area of the region and is a feature value of each pixel that is defined by the mean value of lengths of radial scanning lines from the pixel within a certain extracting field radius to edges which the scanning lines meet for the first time, respectively. The edge inflectivity is calculated as the difference (a square norm) between the respective direction from each pixel toward the edge within a certain extracting field radius and the respective direction of the edge from the neighboring pixel. On the account of an integration effect, this feature is highly immune to being affected by a break in the edge due to an external disturbance.
The region feature image and the contour feature image are image features complementary to each other; only the region feature cannot sufficiently represent the configuration of the object and only the contour feature cannot sufficiently represent the difference in texture, either. Moreover, the region feature image and the contour feature image differ in the sensitivity to noise. The region feature value does not scatter over a wide range even if there are some breaks in the contour by noise. The application of such image features of different properties to object inspection and recognition is to represent the image from various angles, and the use of many such complementary feature images will ensure high reliability in the inspection and recognition of objects.
The use of the image feature extraction method and the matching method have already been proposed in Japanese Pat. Laid-Open No. 175885/92 entitled "Image Feature Selection Method in Generalized Hough Transform." In FIG. 3 there is shown a flowchart of a procedure involved in the proposed image feature selection method.
The procedure starts with step S1 in which standard samples are selected for respective categories, that is, good and no good objects in inspection and a plurality of objects to be distinguished from each other in sorting and recognition; and feature images of different properties are extracted from the sample images and used as standard feature images. Then, feature points are selected for each standard feature image. For example, when the feature image is an edge image, each pixel at each point (a feature point) representing each edge is selected from the edge image. When the feature image is an intensity image, each pixel (a feature point) which takes each quantized intensity value (a feature value) is selected from the intensity image. In step S2, a reference table is prepared for each feature of the respective category in correspondence with such a selected feature point. The reference table, identified by 27 in FIG. 4, uses each feature value f.sub.i as an index to hold the position of each pixel of the feature value (i.e. the feature point in the feature image) as a polar coordinate vector (.gamma..sub.i, .alpha..sub.i), where i=1, . . . , N.sub.p (N.sub.p being the number of feature points) with the origin at a reference point. The reference point is not a particular point in principle, but it is known in the art that when the reference point is set at the position of the center of gravity of an object, a transform error in the generalized Hough transform is minimized.
In step S3, a feature image of each feature is extracted, for each category, from M samples with various external disturbances, and the feature image and the reference table prepared in step S2 are used to conduct the generalized Hough transform.
Referring to FIG. 4, the principle of deriving the generalized Hough transform will be described taking an edge as an example. An edge image (a standard feature image) 28 of a standard image represent contour points of an object. Hence, the feature point of the edge image is each contour point; its feature value, that is, an edge value is "1" and an edge value except the contour point is "0." Consequently, in step S2, only the value "1" at the edge point is stored as an index in an index column of the feature point value f.sub.i of the reference table 27 and the polar-coordinate address (.gamma..sub.i, .alpha..sub.i) of each image (the feature point) p.sub.i of a contour image (an edge image) 29 of the edge image 28 of the standard image, where i=1, . . . , N.sub.p (N.sub.p being the number of contour points, that is, the number of feature points), is stored in an address column of the reference table 27. Next, a description will be given of how an unknown input image 31 is processed using the reference table 27 in step S3. The parameters that are desired to calculate in this case are the position and pose (rotation angle) of the object in the input image 31. The position of the object is the position of the reference point (the origin of the polar coordinates) already determined in the standard feature image 28, that is, the reference point of the object in the input image 31. A feature image identical with the standard feature image, that is, an edge image 32, is produced for the input image 31, and only when the edge value of each pixel (whose orthogonal coordinates is (x,y)) of the edge image 32 is "1" which is the value (the feature value) stored in the reference table 27, the polar-coordinate addresses (.gamma..sub.i, .alpha..sub.i) stored in the reference table 27 corresponding to the feature value are read out one by one to perform the following address translation. EQU x.sub.G =x+r.sub.i cos (.alpha.i+.theta.) EQU y.sub.G =y+r.sub.i sin (.alpha.i+.theta.)
where .theta. is the rotation angle of the object. Now, let it be assumed, for brevity's sake, that the rotation angle .theta. of the object is preknown. The coordinates (x.sub.G, y.sub.G) thus calculated by such an address translation is the position which is likely to be the reference point of the object that is predicted. As shown in FIG. 5, each address (x.sub.G, y.sub.G) converted for one pixel position (x,y) where the edge value "1" was obtained is voted by one to the value at the position (x.sub.G, y.sub.G) on an accumulator array 34 which is a parameter space. That is, all addresses stored in the reference table 27 are sequentially read out therefrom in correspondence with, for example, the point A in an unknown edge feature image 32 and then converted, and sequential voting of such converted addresses (x.sub.G, y.sub.G) to the value at the position (x.sub.G, y.sub.G) on the accumulator array 34 means that a locus (of the value "1") 35a, which is likely to be the reference point, is physically laid down on the accumulator array 34. By performing similar processing for the points B and C in the unknown edge feature image 32, loci 35b and 35c are obtained. The value at each position (x.sub.G, y.sub.G) on the accumulator array 34 is a voted (accumulated) value; hence, when the loci 35b and 35c intersect the locus 35a once, the value at the intersection goes to 2 and when they intersect the locus 35a twice, the value at the intersection goes to 3. Assuming that the relative position of the edge image of the unknown edge feature image 32 to the edge image 29 of the standard edge feature image 28 is held unchanged at at least the points A, B and C, the three loci 35a, 35b and 35c intersect at one point 36. By performing this processing for all edge points on the unknown edge feature image 32 as well as for the above-mentioned three points, loci which are likely to be the reference point on the accumulator array 34 intersect at the same point N.sub.p times if the object of the input image 31 is exactly identical with the object of the standard image 21. The number of times the loci intersect, that is, the accumulated voting distribution, is obtained as indicated by the curve 37 in FIG. 4, for instance; the calculation of the maximum point (position) and maximum peak value (maximum frequency value) of the distribution is to calculate the position of the reference point in the input image 31 and its circularity relative to the standard image 21. When the rotation angle .theta. is unknown, it is varied at a pitch .DELTA..theta. over the range of the angle .theta. alone and similar processing is performed for each rotation angle .theta., using the accumulator array 34 as a three-dimensional one (x.sub.G, y.sub.G, .theta.), not the two dimensional one (x.sub.G, y.sub.G). By this, the position is calculated where the maximum peak value (x.sub.G *, y.sub.G *, .theta.*) of the accumulated voting distribution is provided. This angle .theta.* is the rotation angle of the object desired to obtain. For example, in the case of an object inspection, the variation range of the rotation angle .theta. of the object is predicted to be, for instance, within .+-.10.degree. taking into account the accuracy of a jig used in the fabrication of the object; in this case, the rotation angle .theta. needs only to be varied little by little within the predicted range.
In the address (x.sub.G *, y.sub.G *, .theta.*) where the voted value becomes maximum on the accumulator array 34 which is the parameter space, the parameters x.sub.G * and y.sub.G * indicate the position of the reference point and .theta.* the rotation angle of the object, and the maximum voted value (the maximum frequency) is a quantity representing the closeness of the input image to the standard one.
In step S3 in FIG. 3, the maximum frequency value is calculated for the respective sample in each category, then an intra-category maximum frequency value distribution is generated, and an inter-category maximum frequency value distribution is produced, for example, for an image sample of a no good object with respect to a good object. Then, the distance between the intra-category maximum frequency value distribution and the inter-category maximum frequency value distribution is calculated for each feature in each category. The distance thus calculated is used to determine the weight of each feature in each category. In step S4, the features in the respective categories are sorted in terms of weight to determine the ranking of the features. In step S5, the number of reference tables for combining the features in descending order of ranking is increased for each category, that is, the number of features that are combined is increased, and each combination is evaluated as to what extent the categories can be distinguished from each other. In step S6 the resolution of the input image is changed, and steps S1 through S5 are repeated until the error rate becomes sufficiently small, that is, until good and no good objects can be distinguished, or until the number of undistinguishable categories is reduced to zero.
By such processing as described above, a plurality of kinds of feature images obtainable from the input image are subjected to generalized Hough transform, using the reference tables of the respective feature images. A distance, which depends on the relationship between an intra- and an inter-variance of the frequency distribution of the generalized Hough transform output on the accumulator array, is used as the weight of each feature. The features are combined in decreasing order of weight and the distance between the categories is evaluated, and the features are combined one after another until the weighted distance exceeds a certain value or until the relationship of the input image to the standard one is no longer improved. In the aforementioned Japanese Pat. Laid-Open, there are proposed the weighting of feature sets and the selection of combinations of the feature sets.
As regards the feature point selecting method, however, the prior art literature does no more than propose the selection of feature points remotest from each other in each feature image obtainable from the standard image. With this feature point selecting method, when some of the feature points of the standard image to be selected are spotted with dust or other noise, the inspection ability is impaired because the standard image is incomplete. Moreover, even if an image is chosen as a standard image following a manual, there is no assurance that it is truely a standard image. To overcome this shortcoming, according to the prior art, weights are automatically assigned to the feature points selected from a plurality of good-object image sequences, and at the time of feature set selection and matching, a greater influence is exerted upon the feature points of large weights or high reliability, whereas the influence on the feature points of small weights or low reliability is reduced.
FIG. 6 shows examples of images of inspection patterns. As compared with an ideal pattern depicted in FIG. 6A, the actual good object images have superimposed thereon noise components by dusts and spots; for example, as shown in FIGS. 6B through 6D, the actual images are somewhat displaced or rotated relative to the ideal pattern. An image of a no good object shown in FIG. 6E is somewhat flat as compared with the ideal pattern depicted in FIG. 6A; hence, it has superimposed thereon noise and is displaced/rotated relative to the ideal pattern. The good and no good objects must be distinguished under such circumstances. This problem also arises in sorting and recognizing objects through the use of their images.
It is therefore an object of the present invention to provide an inspection/recognition method which obviates the above-described defects of the prior art and ensures distinguishing between good and no good objects or permits correctly sorting or recognizing objects regardless of intensity variations of good objects or standard sample images which are caused by a change in the environment of illumination or in the reflection characteristic of the sample surface or dust and spots on the objects.