1. Field of the Invention
The present invention relates to an image clustering apparatus for processing of region division, object tracing in image, etc. necessary for high efficiency coding of image, noise reduction, combining and editing, and the like.
2. Related Art of the Invention
Concerning the image clustering technique of dividing the region by finding sets of similar image data from images, a first prior art is disclosed by Nobuaki Izumi, Hiroyuki Morikawa, and Hiroshi Harashima in "A study on segmentation technique combining color information and position information," General Meeting of Japan Society of Electronic Information and Communications, spring 1991, Vol. 7, p. 392, Lecture No. D-680.
It is intended to cluster and divide the image by the K-mean clustering used in creating reference vector by vector quantizer from the image data.
Processing of the K-mean clustering technique is carried out by the repetition of the following three steps.
Step 1) n class means are properly determined.
Step 2) Each element of sample sets is assigned to the class having the class mean of the shortest distance.
Step 3) In each class, the mean of the assigned sample sets is calculated to obtain a new class mean. In this example, the feature vector is combined with the horizontal position and vertical position on the image of the pixel, in addition to three-dimensional vector of RGB color to make up a five-dimensional configuration, and the distance is calculated as shown in formula (1). ##EQU1## where r, g, b, x, y are elements of samples, and r, g, b, x, y are elements of RGB colors and horizontal and vertical positions of the class mean. In image segmentation by conventional clustering used before the first prior art, the spatial continuity on the image was not taken into consideration, and the divisions obtained involved "salt and pepper" (for example, Haralick, R. and Shapiro, L.: "SURVEY: Image Segmenta-tion Techniques," Computer Vision Graphics, and Image processing, Vol. 29, pp. 100-132, 1985). By contrast, in the first prior art, because of clustering by adding horizontal and vertical positions, the neighboring pixels on the image were collected on the same cluster, and relatively continuous regions are obtained.
A second prior art similar to the first prior art was disclosed by Crisman, J. and Thorpe, C. in "UNSCARF, A Color Vision System for the Detection of Unstructured Roads," Proc. 1991 IEEE Int. Conf. on Robotics and Automation, pp. 2496-2501 (1991). In this paper, formula (2) is used as the function of distance EQU (x-u.sub.i).sup.t C.sub.i '.sup.-1 (x-u.sub.i) (2)
where x is a feature vector of which row vector is (r, g, b, x, y).sup.t, and u.sub.i is the mean of the i-th class data. ( ).sup.t denotes transposition of matrix. C' is a covariance matrix normalized so that the value of the matrix formula may be 1, and it is normalized, as shown in formula (5), by the fifth power root .parallel.C.parallel. of the matrix formula. If the inverse matrix of covariance matrix is used in formula (2) without being normalized, all samples may be assigned into one large class. Normalization is used to avoid this. ##EQU2##
The covariance matrix can be directly determined by using assigned samples, in the same manner as when determining the class mean, at step 3 of the K-mean method. In the second prior art, too, the same effect as in the first prior art can be obtained.
The present invention is intended to solve the following problems.
1) In the first prior art, the result of clustering differs depending on the constants Ks and K.sub.l in formula (1). These Ks and K.sub.l constants should be determined automatically depending on luminance changes. It seems, moreover, difficult to set the coefficients if the sample vectors including the differential value and luminance value of infrared waveforms are assumed as features of other pixels. In the second prior art, by normalizing the covariance matrix, weighting about the distance calculation between sample vector and class mean is automated. However, in the method of using the normalized covariance matrix, formula (2) does not represent the statistically normalized distance (Mahalanobis distance), and it does not seem to reflect the physical meaning of the object. In the insulated object illuminated by a single light source, it is known that its luminance distributes on a plane of multidimensional spectrum (for example, RGB space) (for example, Healey, G.: "Segmenting Images Using Normalized Color," IEEE Transaction on Systems, Man, and Cybernetics, Vol. 22, No. 1, pp. 64-75, January, 1992). If the variance of distance from this plane is intrinsic in the object and imaging system, the corrected covariance matrix is inappropriate.
2) In both the first and second prior arts, the K-mean clustering or a similar method (ISODATA) is employed for determining cluster data. It therefore requires memory for holding samples, and the cluster data requires repetition of step 1 to step 3, hence it is difficult to apply in an image with slowly changing data.
3) In the first and second prior arts, there is no teaching information for defining the result of clustering, or concerning image clustering.
However, in image clustering, in many applications, the boundary of the object is given preliminarily about a certain image, and then corresponding boundaries are determined for similar images. In clustering by K-mean method, when the assignment of sample vector is found to be wrong, the method of changing the class data dynamically for this "negative" sample is not known.
Concerning 2), generally, a self-organizing map developed by Kohonen is a known technique for sequentially changing the class data when obtaining class data of vectors having similar features from feature vector sets (for example, Kohonen, T: "The Self-Organizing Map," Proceedings of the IEEE, Vol. 78, No. 9, 1990). More specifically, by operating a processing element, called a neuron, which holds class data and calculates distance, the class data of the smallest neuron among the distance calculations conducted by neurons is changed into a rule called delta rule. As a result, each neuron possesses cluster data. Evidently, the distance calculation by this self-organizing processing technique can be set parallel similar to the processing by K-mean clustering. Advantages of the self-organizing processing are that class data are sequentially obtained, and that memory is not required for holding sample data.
The self-organizing process by Kohonen is the process of operating only the class mean, by regarding the Euclidean distance as the clustering scale. Therefore, it cannot be applied for the purpose of incorporating the covariance as class data. Relating to 3), similarly, the learning vector quantization is reported in the cited reference of Kohonen.
If there is a certain set of sample vector for learning, and the sample vector is assigned to wrong class data, by changing the class mean so as to be far from the negative sample vector, assignment to undesired class data may be decreased. In this problem, too, however, the learning vector quantization of Kohonen is the processing of operating only the class mean, using the Euclidean distance as the clustering scale. Therefore, it cannot be applied for the purpose of obtaining the covariance as class data.
Summary
To solve the above problems, it is a primary object of this invention to present an image clustering apparatus for determining a class mean and covariance in the class so as to make the distribution parameters of pixels expressing the statistical properties of the object, and clustering the image stably at high speed. It is another object of the invention to present an image clustering apparatus for determining the class mean and covariance in the class reflecting the structure of teaching information if there is any teaching information about the segmentation of an image.
An image clustering apparatus of the first embodiment of the present invention concerns the case where there is no teaching information about the segmentation of an image, such an apparatus comprises:
(a) a frame memory for storing an image composed of coded pixels,
(b) reading means for randomly reading out values of pixels about horizontal position and vertical position on the image from the frame memory, and generating a sample vector containing coupling of the read out pixels' values and the corresponding horizontal and vertical positions,
(c) a memory for holding a plurality of sets of covariance matrix and mean vector data of the sample vector as class data,
(d) likelihood calculating means for calculating a likelihood of the sample vector being included in the plural sets of class data based on a distance between sample classes which is a sum of;
a distance obtained by normalizing a difference of the sample vector and the mean vector by the covariance matrix, and a magnitude of the covariance matrix,
(e) maximum likelihood class selecting means for selecting a set in which the distance between sample classes among combinations of the class data is minimal, and
(f) class data changing means for changing the mean vector and covariance matrix composing the class data in a direction which reduces the distance between sample classes, by using the difference vector of the sample vector and mean vector.
An image clustering apparatus of the second embodiment of the present invention concerns the case where there is teaching information about the segmentation of an image, such an apparatus comprises:
(a) a frame memory for storing an image composed of coded pixels,
(b) reading means for randomly reading out values of pixels about horizontal position and vertical position on the image from the frame memory, and generating a sample vector containing coupling of the read out pixels' values and the corresponding horizontal and vertical positions,
(c) a memory for holding a plurality of sets of covariance matrix and mean vector data of the sample vector as class data,
(d) likelihood calculating means for calculating a likelihood of the sample vector being included the plural sets of class data based on a distance between sample classes which is a sum of;
a distance obtained by normalizing a difference of the sample vector and the mean vector by the covariance matrix, and a magnitude of the covariance matrix,
(e) maximum likelihood class selecting means for selecting a set in which the distance between sample classes among combinations of the class data is minimal, and
(f) teaching means for judging the correctness of class data and for minimizing the normalized distance among combinations of the class data, and
(g) class data changing means for:
changing the mean vector and covariance matrix composing the class data after input of the sample vector in a direction which reduces the distance between sample classes, when judged to be correct by the teaching means, and
changing the mean vector and covariance matrix composing the class data in a direction to increase the distance between sample classes by using the difference vector of the sample vector and mean vector, when judged to be incorrect by the teaching means.
According to the first embodiment, the reading means randomly reads out the pixel values at a horizontal position and vertical position on the image from the frame memory, so that convergence into biased class data is avoided. The reading means also generates sample vectors containing a coupling of a feature value of the pixel and the corresponding horizontal and vertical positions. By producing the sample vector coupling the horizontal and vertical positions, the neighboring pixels on the image belong to a same class. Consequently, for the sample vector being read out, the likelihood calculating means calculates the likelihood of inclusion in the plural class data.
The likelihood is evaluated to be high when the distance of adding the magnitude of the covariance matrix to the distance normalized by the covariance matrix is small. By adding the magnitude of covariance matrix to the normalized distance, the state of all sample vectors attributing to a class of large variance can be avoided.
Next, by the maximum likelihood class selecting means, the class of the highest likelihood is selected, and about this class, the values of the mean vector and covariance are sequentially changed, by using the difference vector of the sample vector and mean vector, in a direction of decreasing the distance between sample classes by the class data changing means.
This action is explained below by using numerical expressions.
Supposing the coupling of luminance vector of image and two-dimensional position information vector to be feature vector (d being the number of dimensions, d.gtoreq.3), where the likelihood of it belonging to a certain class is evaluated in formula (4). EQU .intg.(x,.vertline.u.sub.i, .SIGMA..sub.i)=(x-u.sub.i).sup.t .SIGMA..sub.i.sup.-1 (x-u.sub.i)+ln.vertline..SIGMA..sub.i.vertline.( 4)
where .SIGMA..sub.i is covariance matrix, and u.sup.i is class mean. Formula (4) is an example of a form of adding the distance of the difference of sample vector and mean vector normalized by the covariance matrix, and the magnitude of covariance matrix, in which by obtaining the logarithm of the probability density function assuming the differential vector of x and u.sub.i to be in normal distribution, the constant portion is removed and the sign is changed. Since the position information is included in the sample vector x, the pixels for reducing the value of formula (1) are limited to the vicinity of the class mean on the image. In addition to the first term of the right side of formula (4), the logarithmic term of covariance matrix is added to the second term, and hence the problem of the class containing a sample having a large difference is avoided.
Expressing the elements of u.sub.i and .SIGMA..sub.i in formula (4) to be .sigma..sub.pq (i) (1.ltoreq.p, q.ltoreq.d), the element of inverse matrix .SIGMA..sub.i.sup.-1 to be .sigma.'.sub.pq (i) (1.ltoreq.p, q.ltoreq.n), the p-th component of x to be x.sub.p, and the p-th component of u.sub.i to be u.sub.p (i), the gradient of formula (5) about each element of covariance inverse matrix is obtained in formulas (5) and (6).
Seeing .sigma.'.sub.pq (i)=.sigma.'.sub.qp (i), .sigma..sub.pq (i)=.sigma..sub.qp (i), the Kronecker's .delta..sub.pq term is present. ##EQU3##
The relation of formula (7) is established between the element .sigma..sub.pq (i) of covariance matrix and the element .sigma.'.sub.pq (i) of the inverse covariance matrix. ##EQU4## where .SIGMA..sub.pq (i) expresses the cofactor matrix of pq component at .SIGMA..sub.i.
Therefore, .differential..SIGMA..sub.pq (i)/ .differential..SIGMA..sub.pq is a real symmetric matrix of n-2 order.
Seeing also that the covariance matrix formula is positive semi-definite, the value of formula (7) is always negative semi-definite. Hence, there exists the delta rule of varying the class covariance in the direction of decreasing the distance shown in formula (4), by making use of the difference vector of sample vector and mean vector.
Ignoring the direction of inclination, formulas (8) and (9) are obtained. EQU .DELTA.u.sub.i =.alpha.(x-u.sub.i),0.0.ltoreq..alpha.&lt;1.0 (8) EQU .DELTA..SIGMA..sub.i =-.beta.(.SIGMA..sub.i-( x-u.sub.i) (x-u.sub.i).sup.t), 0.0.ltoreq..beta.1.0 (9)
By adding .DELTA.u.sub.i, .DELTA..SIGMA..sub.i to the class mean, covariance matrix of class i regarded to be closest to the sample vector, the distance is shortened for that sample, and clustering having a small error is achieved.
In the case that teaching signal is included in clustering, according to the second embodiment of the present invention, there is teaching means for judging correctness of class data minimizing the normalized distance among combinations of class data and producing the result as output. When the teaching means correctly judges the class data minimizing the distance between sample classes, for example, .DELTA. u.sub.i and .DELTA. .SIGMA..sub.i are calculated from the differential vector of, for example, the sample vector and mean vector, and by adding .DELTA. u.sub.i , .DELTA. .SIGMA..sub.i to the class mean and covariance matrix of class i which are regarded to be closest to the sample vector, the distance to that sample is shortened. Besides, if judged to be wrong by the teaching means, by decreasing by .DELTA.u.sub.i , .DELTA..SIGMA..sub.i, the distance to that sample is extended. Thus, clustering error can be reduced.