1. Field of the Invention
The present invention relates to the field of data clustering and, in particular, to a system and method for hybrid hierarchical segmentation.
2. Description of the Related Art
Clustering is the process of dividing a set of observations into smaller subsets. Some conventional approaches to clustering are very computationally expensive and simply cannot be applied to large datasets, such as datasets with over 150 million observations, each associated with 500 or more variables. Other conventional approaches rely on sampling the dataset a pre-determined number of times and generating clusters associated with the samples. These sampling-based approaches suffer from lack of replicability, since the resultant clusters are highly susceptible to bias based on the initial sampling and the number of samples. Still other clustering approaches rely on segmenting the data based on business rules. These rule-based approaches also suffer from bias based on the original selection of the business rules.
As the foregoing illustrates, there is a need in the art for a mathematical clustering technique that is replicable and can be applied to large datasets.