In the information age, individuals and organizations increasingly store, manage, and analyze large amounts of data. Sometimes an organization may wish to discover relationships between a number of data samples and/or to classify these data samples in a systematic and meaningful way.
Clustering techniques may automatically group a set of data samples by their similarity and/or coherence across a number of dimensions. Such techniques may find application in a wide array of scientific, technological, and other research endeavors.
Variations in clustering techniques may produce significantly different results. For example, the choice of a distance function (e.g., that specifies the similarity between any two data samples) may impact the cluster in which one or more data samples are ultimately placed. Likewise, a data sample may have many identifiable attributes, some of which may improve cluster quality, and some of which may only add noise and/or produce misleading classification results when clustering. Traditional clustering technologies have failed to produce a “one-method-fits-all” approach that yields the optimal clustering results in every domain.
Accordingly, the instant disclosure identifies and addresses a need for additional and improved systems and methods for clustering data samples.