The invention relates generally to image processing and more specifically to selecting clusters of items associated with particular bins using a fuzzy histogram technique.
A number of clustering algorithms are described in the book xe2x80x9cClustering Algorithmsxe2x80x9d by H. A. Hartigan. The majority of prior art clustering algorithms follow one of the two following approaches:
In the first approach, a single cluster is formed of all of the data, and then the cluster currently under consideration is split in some way into two or more clusters, with each resulting cluster being recursively considered and possibly split, if it does not satisfy some criterion. The key source of variation in the many algorithms that follow the recursive-split approach is the way of splitting.
In the second approach, each item in the full data set is initially in a single cluster, and then nearby clusters are merged with each other until there are no clusters which continue to improve the clustering if they are merged. In some applications there is an advantage to having the ordering in which clusters were merged, as this gives a hierarchical clustering of the data. A refinement which combines the two dominant approaches is to alternate between splitting and merging.
The method of the present invention is faster than either splitting or merging based techniques, since it takes time proportional to the number of items to be clustered, plus time proportional to the number of clusters found, while prior methods require time proportional to the number of data items plus time proportional to a function greater than linear in the number of clusters found.
In many applications it may be used on its own; in applications where the properties of a particular prior art algorithm are desired, the present method may be used as a pre-process, followed by a reduced amount of refinement by the prior art algorithm.
Various techniques for cluster analysis have hereinbefore been devised as illustrated by the following disclosures, which may be relevant to certain aspects of the present invention.
U.S. Pat. No. 4,858,141 to Hart et al. issued Aug. 15, 1989, discloses a cost analysis technique employed to group changes in measurements into certain categories in logic application to identify individual appliances. In particular, the purpose of cost analysis is to determine which changes in the measurements are commonly observed. Frequent occurring changes can be as a result of setting an appliance on and off. Therefore, the number of clusters found indicates the number of appliances and the number of changes in a cluster indicating the frequency of the appliance used. The pairing of on and off transitions enables an algorithm to determine the energy consumption of the individual appliances. Therefore, the characteristic changes in the measurement associated with each cluster can be used to identify the nature of the individual appliance.
U.S. Pat. No. 5,621,861 to Hayashi et al. issued Apr. 15, 1999, discloses a method of learning data required to execute a neural network learning procedure. Such learning procedure, includes the steps of supplying an original set of learning sampling data to an input layer of a neural network and measuring a first value of a recognition index that is obtained by the neural network, dividing the original set of learning data into a plurality of different subgroups and judging respective values of a recognition index obtained from each of the subgroups. Hence, selecting from the subgroups for use in the learning procedure, one subgroup meeting a criteria of providing a value of a recognition index that is at least equal to the first value of the recognition index. The original and high set of learning samples are divided using a cluster analysis of the original entire learning sample data into a plurality of subgroups, the subgroups being respectively applied to a neural network as learning data and with respect of values of recognition index obtained thereby for the neural network being judged.
U.S. Pat. No. 5,179,643 to Homma et al., issued Jan. 2, 1993 discloses a method and system for extracting a characteristic from information handled by a computer and displaying the information in a manner to clarify the characteristic. A cluster analysis is used to analyze a relationship among information items.
U.S. Pat. No. 5,389,936 to Alcock, issued Jan. 14, 1995 discloses a method of analyzing clusters of bearings A, B, C, D taken of distant sources by an array of direction finding stations. A combination of bearing is taken one from each of the stations. Each bearing of the combination is taken in turn as a spoke directed at a source. The triangulation process generates a bar of intersection points along the spoke for each bearing of the combination. The number of overlaps between pairs of bars along a spoke is totaled to give a spoke score. The spoke scores of all spokes are the combination is summed to form a fixed confidence score for a source which may be associated with a cluster.
U.S. Pat. No. 5,644,232 to Smith, issued Jul. 1, 1997 discloses a method and apparatus for a medical applications. A cluster analysis can be used to, for example, to show the comparison of a viable tumor in a lung injury is to a viable tumor in a hepatic metastasis. A cluster analysis can be formed in the following way to give a numerical estimate of similarity.
U.S. Pat. No. 5,644,232 to Smith, issued Jul. 1, 1997 discloses a cluster analysis technique that can assist or replace objective judgement of trained operators when using an MRI apparatus. In particular, Smith discloses cluster analysis used in conjunction with calculations or judgements regarding a similarity with respect to stored libraries of signatures.
U.S. Pat. No. 4,937,747 to Koller, issued Jun. 26, 1990 discloses a method of cluster analysis wherein an analysis for determining subpopulations in a dataset is determined. In particular, a data set comprising depth related log responses is selected by classifying the dataset into disjoint clusters and performing various measurement agreements/disagreements.
All of the above cited references are incorporated by reference for their teachings.
In order to achieve the foregoing and other objects, and to overcome the shortcomings discussed above, a method for choosing clusters in a data set is presented. Included in this method is the receiving of item data including coordinates of a metric space, the dividing of the metric space into a plurality of bins, and the associating of a distance from at least a particular coordinate to each of the item data. The method further involves the inserting of each of the item data into a bin within the distance of the item data so as to generate a histogram and using the histogram to obtain one or more clusters.
An approach is presented to determine all the bins within a given radius of each item of a set of items. Next, in this approach, there is performed a procedure of entering an item of the set of items into each bin within the given radius, and incrementing a count associated with each bin as an item is being entered. After the count is incremented, a histogram is developed to store the counts associated with each bin.
Another approach includes the steps of receiving item data including coordinates of a metric space, dividing the metric space into a plurality of bins, and associating a distance to each of the item data. The approach further includes inserting each of the item data into a bin within the distance of the item data and from this step both generating a histogram and using the histogram to determine a cluster.
The methods described above can be carried out in a microcomputer programmed which would, for example, receive item coordinate information of a particular space, and would associate distance information from various coordinate(s) to each of the item data.
Other objects, features, and advantages according to the present invention will become apparent on the following detailed description of illustrative embodiments when read in connection with the accompanying drawings in which corresponding components are identified by the same reference numerals.