This invention relates to image processing devices used in automated production lines, etc., which is especially suited for effecting template matching of grey level images to align the images or to extract predetermined patterns, for detecting movement vectors of moving objects, for recognizing or identifying the objects, or for detecting the parallaxes of continuously moving objects.
A conventional method for effecting template matching of grey level images is described, for example, in a book by K. Tezuka et al., "Digital Image Processing Engineering," Nikkan Kogyou Shinbunsha, Tokyo, 1985, in a paragraph entitled: "Matching by Correlation," Chapter 5, Section 2: "Template Matching," p. 107. The method is based on the determination of the correlation between the grey level images, and hence has the advantage that a good matching can be accomplished even for the images which do not lend themselves easily to bi-level quantization.
FIG. 31 is a block diagram showing the structure of a conventional template matching device described in the above-mentioned book. The device includes: a reference pattern storage 81, an object image storage 82, a similarity calculator 83, a score map 84, and a controller 85.
FIG. 32 is a flowchart showing the matching procedure followed by the conventional template matching device of FIG. 31. When the procedure is started at step S91, the object image is input by means of a camera, etc., and stored in the object image storage 82. It is assumed that the reference pattern is stored beforehand in the reference pattern storage 81. At step S92, the superposition displacement vector (a,b) is set. Next at step S93, the reference pattern is superposed upon the object image at the displacement (a,b), and the similarity between the two images, i.e., the reference pattern and the object image, is calculated by the similarity calculator 83 by means of the following equation: ##EQU1## where M(a,b) is the similarity at displacement (a,b), a and b are the components of the displacement vector (a,b) in the directions of the indexes i and j, respectively, I(i,j) is the object image, and R(i,j) is the reference pattern, wherein the summation is taken over the two indexes i and j.
At step S94, the similarity calculated at step S93 is stored in the score map 84. The similarity is the score at the displacement (a,b). At step S95, it is judged whether or not the displacement vector (a,b) has traversed a predetermined range (e.g., the range defined by: a.sub.1 .ltoreq.a.ltoreq.a.sub.2, b.sub.1 .ltoreq.b.ltoreq.b.sub.2, where a.sub.1, a.sub.2, b.sub.1, and b.sub.2 are predetermined constants). If the judgment is affirmative at step S95, the procedure terminates. If, on the other hand, the judgment is negative at step S95 and there still remains displacements (a,b) in the predetermined range, the execution returns to step S92 to repeat the steps S92 through S95.
The entries of the two-dimensional score map 84 are thus filled with the scores at respective displacements (a,b) within the predetermined range. Then, the controller 85 searches the score map 84 for the maximum score. The displacement (a,b) at which the score is at the maximum is the position at which the object image I(i,j) is best aligned with the reference pattern R(i,j).
The above-mentioned book, Tezuka et al., further describes the regional segmentation method, in a paragraph entitled: "Mode Method," Chapter 4, Section 3, "Regional Segmentation," p. 79. FIG. 33 is a block diagram showing the structure of a conventional regional segmentation device. The regional segmentation device includes an original image storage 2141, a grey level histogram generator 2142, a threshold level determiner 2143, a bi-level quantizer 2144, and a regional segmentation means 2145.
FIG. 34 is a flowchart showing the regional segmentation procedure followed by the conventional regional segmentation device of FIG. 33. It is assumed that when the procedure starts at step S2151, the object image is stored beforehand in the original image storage 2141. At step S2152, the grey level histogram generator 2142 generates grey level histogram of the original image. FIG. 35 shows an exemplary grey level histogram generated by the grey level histogram generator in the procedure of FIG. 34. The grey level histogram is the plot of the frequency of the grey levels (plotted along the ordinate in FIG. 35) at respective pixels of the image (plotted along the abscissa in FIG. 35). At step S2153, the threshold level determiner 2143 determines the threshold level on the basis of the grey level histogram obtained at step S2152. If the histogram exhibits two distinct maxima or hills as shown in FIG. 35, the threshold level is set at the minimum (trough) between the two maxima.
At step S2154, the bi-level quantizer 2144 thresholds the grey level of the respective pixels of the image at the threshold level determined at step S2153. Namely, the bi-level quantizer 2144 determines whether or not the grey levels of the pixels are above or below the threshold level and converts them into the binary level 1 when the grey levels are above the threshold level and the binary level 0 when the grey levels are below the threshold level. The grey levels of the respective pixels are thus subjected to the bi-level quantization and a binary image is obtained. At step S2155, the regional segmentation means 2145 segments the binary image, and the procedure is terminated at step S2156.
The above book further describes the contour segmentation method, in a paragraph entitled: "Polyhedrons and their Linear Drawings," Chapter 8, Section 3, "Interpretation of Linear Drawings," p. 176. FIG. 36 is a block diagram showing the structure of a conventional contour segmentation device. The device includes an original image storage 2171, a contour extractor 2172, a rectilinear approximator 2173, a contour segmentation means 2174, and a vertices dictionary 2176.
FIG. 37 is a flowchart showing the contour segmentation procedure followed by the conventional contour segmentation device of FIG. 36. First at step S2181, the original image is input by means of a camera, for example, and stored in the original image storage 2171. Next at step S2182, the contours of the objects represented in the image are extracted. The term "contours" as used here include not only the outlines of the objects, but also the lines defined by the boundary between two surfaces of the objects meeting at an angle (e.g., the edges of a polyhedron) and the regions or strips of the image at which the grey level of the pixels changes abruptly. At step S2183, the extracted contours are approximated by the rectilinear approximator 2173 by a plurality of rectilinear lines, and the set of the approximating rectilinear lines thus obtained are stored therein.
Next, the operation of the contour segmentation means 2174 is described in detail. At step S2184, the contour segmentation means 2174 labels the lines of known contours with marks. The marks are sequential ID (identity) numbers, etc. The purpose of the contour segmentation device of FIG. 36 is to label all the line elements with respective marks.
At step S2185, an arbitrary vertex is selected and an actual (three-dimensional) shape of an object is assumed upon the vertex. Here the vertices dictionary 2176 is consulted. FIG. 38a is a diagram showing the vertices and the edges meeting thereat which may be formed by an object in the physical three-dimensional space, where the lines representing edges meet obliquely in the image. FIG. 38b is a diagram similar to that of FIG. 38a, but showing the case where the lines representing the edges meet at right angles in the image. The plus sign (+) at an edge represents that the edge is a convex edge. The minus sign (-) represents that the edge is a concave edge. The vertices dictionary 2176 stores diagrams or representations such as those shown in FIG. 38a and 38b. The vertices dictionary 2176 includes representations of all possible combinations of the vertices and the edges meeting thereat of a physically realizable three-dimensional object. Thus, the shapes of the edges (i.e., whether the edges are convex or concave) meeting at the arbitrarily selected vertex are determined at step S2185. (The vertices whose shape is thus determined are referred to as determinate vertices.) The shapes of the edges are conserved along the lines. Thus, this condition determines the shapes of those edges meeting at a new vertex which extend from an determinate vertex. These shapes of the determinate edges at the new vertex form the precondition for the complete determination of the shape of the new vertex. As described above, the shape of an edge is represented by the plus or the minus sign, and the shape of a vertex is represented by a set of plus and the minus signs of the edges meeting thereat. At step S2186, it is judged whether or not all the vertices are examined. The determination of the shapes of the vertices are repeated until the judgment at step S2186 becomes affirmative.
At step S2187, it is judged whether or not a contradiction exists among the shapes of the vertices. If any contradiction exists in the selection of the vertices, the procedure returns to the stage at which such contradiction has occurred, and re-selects the vertices. Thus it is determined whether it is possible to define the shapes of all the vertices without contradiction. The contour segmentation means 2174 tries to determine all the shapes of the vertices, selecting a shape of each vertex from among the shapes of the vertices registered in the vertices dictionary 2176, wherein each edge should be assigned either a plus or a minus sign. If the judgment is affirmative at step S2187 (namely, if all the lines are assigned a plus or a minus sign without contradiction), the contour segmentation is completed and the procedure of FIG. 37 terminates at step S2188. Then the contour segmentation device outputs the labelled contours 2175. The labelled contours 2175 consist of contours (or descriptions of rectilinear line elements) labelled with marks.
Next, a conventional image processing device provided with a movement vector extractor means is described following M. Taniuchi, "Robot Vision," Kabushiki Kaisha Shokodo, 1990, in which section 8.3: "Correspondance between Images," p. 212, describes such image processing device. FIG. 39 is a block diagram showing the conventional image processing device provided with a movement vector extractor. The image processing device includes: a pair of original image memories 3101, 3102, a pair of contour extractors 3103, 3104, a pair of segmentation extractor means 3105, 3106, a pair of short line division means 3107, 3108, segmentation matching means 3109, a short line matching means 310A, and a movement vector calculator means 310B.
FIG. 40 is a flowchart showing the movement vector calculation procedure of the conventional image processing device. FIGS. 41a and 41b show two successive images from which the movement vector is to be extracted. The image F2 3122 of FIG. 41b is taken a short interval of time after the image F1 3121 of FIG. 41a. The trees 3123, 3124, 3125, 3126 are assumed to be stationary. The figure of an animal 3127, 3128 at the center of the image is moving.
It is assumed that the image F1 3121 is stored in original image memory 3101, and the image F2 3122 is stored in original image memory 3102. At step S3112 after the start at step S3111, the movement regions are extracted from the two original images. This is effected as follows. The respective images are first divided into a plurality of small regions, and then using the correlation method, etc., the correspondence between the regions of approximately equal grey levels is determined. In this example, the region containing a moving object (presumably the figure of a hippopotamus) is extracted from each image. At step S3113, the contour portion of the image F1 3121 is extracted by ,the contour extractor 3103 and then is segmented by the segmentation extractor means 3105. Similarly, at step S3114, the contour portion of the image F2 3122 is extracted by the contour extractor 3104 and then is segmented by the segmentation extractor means 3106.
Next the segmentation procedure is described. In the case of this image processing device, the contour segmentation is effected as follows. The contour, forming a boundary of a region, etc., is divided into a plurality of line segments delimited by end points and T-junction points of the contour. The contour is thus described as a set of these line segments. At step S3115, the matching of the movement regions, which are segmented by means of the segmentation extractor means 3105 and 3106, is effected by the segmentation matching means 3109. FIG. 42 is a diagram showing the segmentation matching procedure. The diagram is cited from Taniuchi mentioned above. The partial contour 3131 of the image F1 is drawn adjacent to the partial contour 3132 of the image F2. The two partial contours substantially correspond to each other. The part of the respective contours delimited by the two rectangles (.quadrature.) is a line segment. FIG. 42 shows that the segment A.sub.k in image F1 corresponds to the segment A'.sub.m in image F2.
Next at step S3116, the line segments of the F1 region are each divided into short lines by means of the short line division means 3107. Similarly, at step S3117, the line segments of the F2 region are each divided into short lines by means of the short line division means 3108. Thus each contour is described by a set of short lines. At step S3118, the two set of short lines corresponding to the contours of the images F1 and F2, respectively, are matched with each other by means of the short line matching means 310A. Thus, the correspondence between the movement regions of the two images is established at the of level of short lines. For example, in FIG. 42, the short lines L.sub.1, L.sub.2, and L.sub.3 of the contour 3131 of the image F1 correspond to the short lines L.sub.1 ', L.sub.2 ', and L.sub.3 ' of the contour 3132 of the image F2. Finally at step S3119, the movement vector calculator means 310B calculates the movement vector between the two moving objects of the images F1 and F2.
Further, a conventional method of image recognition procedure is described in the above-mentioned Taniuchi, at Section 4.3.2 "Method of Regional Division," p.79, Section 4.3.3 "Edge Detection and the Method of Region," p. 82, Section 5.1.1 "Several Fundamental Characteristics," p. 91, Section 6.1.3 "Pattern Recognition using Global Characteristics," p. 109, and Section 6.2 "Recognition also using the Relation between Regions," p. 117 thereof.
Next, the conventional image recognition procedure as set forth in Taniuchi is described briefly, and then the conventional image segmentation method as taught by Taniuchi is described in detail.
FIG. 43 is a block diagram showing a conventional image recognition device using the method described in Taniuchi. The image processing device of FIG. 43 includes: a TV camera 4071 for imaging an object 4072, a region extractor means 4073 for extracting a predetermined region of the image of the object 4072 taken by means of the TV camera 4071, a characteristic extractor means 4074 for calculating the characteristic value of the region extracted by the region extractor means 4073, a characteristic space memory 4075 for storing the characteristic value calculated by the characteristic extractor means 4074, a classification means 4077 for classifying the object into classes on the basis of the class representative values 4076 and the characteristic values corresponding thereto, thereby determining a result 4078.
FIG. 44 is flowchart showing the recognition procedure followed by the image processing device of FIG. 43. The purpose of the image processing device of FIG. 43 is to determine the kind of the object and classify it into an appropriate class. The procedure of FIG. 44 is divided two stages: the preparatory stage and the execution stage. First, the preparatory stage is described.
At step S4081 at the beginning of the preparatory stage, the image of an object 4072 which is to be recognized, or that of an object 4072 belonging to the same category, is taken by means of a TV camera 4071. Next at 4082, the region extractor means 4073 extracts a region from the image of the object 4072 taken by means of the TV camera 4071. At step S4083, the characteristic extractor means 4074 extracts predetermined characteristics of the region extracted by the region extractor means 4073, and calculates the characteristic value (vector) of the object, which is plotted at a position within the characteristic space memory 4075 corresponding thereto.
FIG. 45 is a diagram schematically representing the characteristic space memory 4075 of the image processing device of FIG. 43. In the example shown in FIG. 45, two kinds of characteristics: "the area of the region (X1)" and "the likeness of the region to the circle (X2)" are used. Both of these two characteristic values are represented by scalar quantities. For each region, the pair of the area X1 of the region and the likeness to the circle X2 are calculated. Thus, the (two-dimensional vector) characteristic value of the object 4072 can be plotted on the two-dimensional characteristic space as shown in FIG. 45. When three or more characteristics are used, the characteristic space having a dimension equal to the number of the characteristics is to be used. Moving the position of the object 4072 successively, or replacing it with another, the steps S4081 through S4083 are repeated, such that a multitude of points are plotted in the characteristic space memory 4075. Generally, these points are divided into several clusters.
At step S4084, the clusters are extracted from the characteristic space. For example, in the case of the example shown in FIG. 45, the multitude of points form three clusters of points. The three clusters are named class 1, 2 and 3, respectively. At step S4085, representative points of the respective classes 1, 2 and 3 are extracted. In the case of the example shown in FIG. 45, the classes 1, 2 and 3 are represented by the respective centers of gravity C1, C2 and C3 thereof, which are the class representative values 4076. The above procedure constitutes the preparatory stage.
Next the execution stage is described. The execution stage is the stage at which the kind of the unknown or unidentified object positioned in front of the TV camera 4071 is determined. First at step S4086, the image of the unidentified object 4072 is taken be the TV camera 4071. Next, at step S4087, first the region extractor means 4073 extracts the region from the image of the object 4072 taken by the TV camera 4071, and then the characteristic extractor means 4074 extracts the characteristic of the extracted region, The kinds of the characteristics used at step S4087 are the same as those used in the preparatory stage. Namely, in the case of this example, the characteristics are the area X1' and the likeness to the circle X2'. (The parameters X1 and X2 are primed to distinguish them from those at the preparatory stage.)
Next at step S4088, the classification means 4077 determines the closeness of the characteristic point (X1', X2') of the object 4072 to the respective classes 1, 2 and 3, Namely, the classification means 4077 calculates the distances d.sub.1, d.sub.2 and d.sub.3 from the characteristic point (X1', X2') to the respective representative points C1, C2 and C3 of the classes 1, 2 and 3. Further at step S4089, the classification means 4077 determines the class to (the representative point of) which the distance d.sub.i is the shortest. The closest class is determined as the class to which the unidentified object belongs. For example, in the case of the example shown in FIG. 45, the distance d.sub.1 to the representative point C1 of class 1 is the shortest. Thus the classification means 4077 determines that the object 4072 belongs to the class 1. The procedure is thus terminated at step S408A.
Next, the operation of another conventional image processing device for effecting image recognition is described. FIG. 46 is a flowchart showing the graph matching procedure used by a conventional image processing device for the image recognition. First at step S4101 in FIG. 46, model graphs representing the models used in the recognition procedure are generated inside the computer. FIG. 47 is a diagram showing an exemplary model graph. In FIG. 47, the object is divided into regions and the relations among the respective regions are represented by means of the graph. The node A 4111, the node B 4112, and the node C 4113 represent the regions of an object. The arrowed edges connecting these nodes represent the relations among the regions.
Next at step S4102, the image of the object to be recognized is input. At step S4103, the region extractor means extracts regions from the image. Next at step S4104, the mutual relations among the regions are represented by means of a graph. In FIG. 47, the node a 4114, the node b 4115, and the node c 4116 represent the regions of the object, and the arrowed edges connecting the nodes represent the relations among the regions. At step S4105, the object is recognized (i.e., identified or classified) by matching the two graphs: the model graph and the graph of the object (input graph). Namely, in the case of the example of FIG. 47, the arrowed edges connecting the nodes of the model graph are collated or matched with the the arrowed edges connecting the nodes of the input graph. The correspondence between the matched edges is represented by hooked edges 4117, 4118, and 4119. The correspondence is evaluated by a predetermined evaluation index function. The model to which the highest evaluated correspondence can be established is determined as the one representing the object. The object is thus recognized as that represented by the model graph.
As described above, the conventional image processing device extracts regions from the image, which are used in the image recognition procedure. Next, the operation of the region extractor means is described in detail.
FIG. 48 is a flowchart showing the details of the region extraction procedure. At step S4121, a pixel P.sub.i is selected as the current pixel P1.sub.i within the image. FIG. 49 is a diagram representing the region extraction procedure. In the image plane 4131 at (a), the pixels are represented by circles, triangles, and crosses.
At step S4122, the characteristic value vector (X.sub.i, Y.sub.i, Z.sub.i) at the current pixel P1.sub.i is plotted in the characteristic space. In the example shown, the characteristic value vector consists of three scalar quantities X.sub.i, Y.sub.i, and Z.sub.i, and hence the characteristic space is three-dimensional. The number of the components of the characteristic value vector may be two or more than three. The dimension of the characteristic space is equal to the number of the characteristic value vector. In FIG. 49, the pixels represented by circles are plotted to a first cluster 4134 in the characteristic space at (b); the pixels represented by triangles are plotted to a second cluster 4135 in the characteristic space; and the pixels represented by crosses are plotted to a third cluster 4136 in the characteristic space.
At step S4123 it is judged whether or not all the pixels have already been plotted. If the judgment is negative, the execution proceeds to step S4124 where the next current pixel P1.sub.i is selected, and the steps S4122 and S4123 are repeated. When all the pixels are plotted and the judgment at step S4123 finally becomes affirmative, the pixels of the image plane at (a), represented by the circles, the triangles, and the crosses, respectively, are mapped to the clusters 4134, 4135, and 4136, respectively, in the characteristic space.
At step S4125, the points plotted in the characteristic space are divided into clusters. In the case of the example shown in FIG. 49, the points are divided into three clusters 4134, 4135, and 4136. The first cluster 4135 is assigned the value 1; the second cluster 4136 is assigned the value 2; the third cluster 4137 is assigned the value 3. At step S4126, these values are registered upon the image plane at (c) by means of the inverse mapping. As a result, the image plane is divided into three regions the pixels of which are assigned the values 1, 2, and 3, respectively. The procedure terminates at step S4127.
The above conventional image processing devices have the following disadvantages.
First, in the case of the image processing device which effects template matching using equation (1) above, the calculation of the correlation takes much time. Further, if the calculation by means of equation (1) is done by means of a hardware, the implementation circuit becomes large-scaled and complicate.
Further, in the case of the image processing device of FIG. 33 by which the binary threshold level is determined from the grey level histogram of the image, it is necessary that the grey level histogram exhibits two maximums or peaks. If the grey level histogram does not exhibit any maximum or exhibit three or more maximums, it is difficult to determine an appropriate threshold level for obtaining a clear binary (bi-level) image. Hence the resulting binary image is difficult to segment.
Further, the image processing device of FIG. 36 using the vertices dictionary for segmenting the contours of an image has the following disadvantage. The contour points or the line elements generated form the contour points are segmented using the vertices dictionary. Thus, the vertices dictionary and the means for judging the consistency of the vertices within the image with respect to the vertices dictionary are indispensable. The contour segmentation procedure thus tend to become complex and takes much time to complete. Further, if the procedure is implemented by hardware, the circuit implementing the procedure becomes large and complex.
In the case of the conventional image processing device of FIG. 39 for extracting the movement, it is assumed that the moving region is detected before the extraction of the movement vector. However, if the movement region is detected by means of the correlation method, for example, the detection is not fully reliable unless the brightness and the form of the objects do not vary greatly between the successive image frames. Further, it takes much time to calculate the movement vector from the contour information. The image processing device is not suited for extracting the continuous movement of an object from three or more successive image frames.
Further, in the case of the above conventional image processing device, the movement is detected using the contours. Thus, if there exist a plurality of of objects other than the target object, an accurate division of the contours into short lines is difficult, due to the existence of spurious contours resulting from other objects.
Furthermore, in the case of the conventional image processing device for recognizing objects, the recognition is effected on the basis of the information upon the characteristic values obtained for respective regions. Thus, if the division or segmentation of the regions of the objects to be recognized is not identical with the regional information of the computer model, the reliability of the recognition is reduced drastically.
Furthermore, with respect to the method of the division of regions, the image is divided into regions using characteristic values such as the brightness and the color, upon the assumption that most of the pixels upon a surface exhibit similar characteristic values. However, in the case where the color information can not be used due to the restriction upon the size of the device,. the brightness information (the grey level information) of each pixel is the most important characteristic value available. Thus, if the brightness changes within the surface or if there are variations in the intensity of illumination, the division into regions cannot be effected reliably. As a result, the recognition of the objects becomes difficult.