1. Field of the Invention
The present invention relates to the field of data processing. More particularly, the present invention relates to a method and apparatus for clustering data points in a data set having mixed attributes or features.
2. Description of the Related Art
Conventional data clustering techniques perform well when all of the data points of a data set contain the same type of attributes or features. That is, all data points of the data set have only one type of attribute, such as categorical, binary or real data (numeric, continuous) attributes. Conventional data clustering techniques, such as the k-medians, k-prototype and k-means algorithms, breakdown when a data set has mixed-mode attributes, such as attributes that are a combination of categorical, binary and/or real data attributes.
The k-medians clustering algorithm is designed for clustering data having categorical attributes, but not for data having mixed attributes. The k-prototype algorithm does not handle mixed attributes directly, but uses a tuneable parameter for combining attribute types. Nevertheless, the k-prototype algorithm produces less than optimal results than for data having purely categorical attributes.
H. Ralambondarainy discloses a data clustering technique for converting data having categorical attributes to 1-of-p representations that are then combined with data of the same data set having real attributes. The combined 1-of-p representations and real attributes are used directly in a clustering algorithm, such as the k-means algorithm.
The other conventional techniques for clustering categorical attributes are either hierarchical algorithms or conceptual clustering algorithms. The hierarchical algorithms are 0(n.sup.2), where n is the number of data points, and, consequently, are too computationally intensive for large data sets. The conceptual clustering algorithms are not particularly useful for numeric attributes, particularly when the data is noisy.
What is needed is an efficient way for clustering data points having mixed attributes whether the attributes are categorical, binary and/or real data attributes.