Clustering is an important module in user segmentation solution methodology. At a high level, clustering deals with finding a structure in a collection of unlabeled data. Clustering is a statistical concept for identifying interesting distribution patterns and similarities between objects (users) in a data set. It is an optimization problem that seeks to classify objects (users) based on their proximity to one another. In this sense, objects (users) that are most similar are grouped together forming groups of similar objects (users) referred to as clusters. Clustering tasks involve generating clusters that are compact and well-separated from one another. A cluster is therefore a collection of objects (users) which are “similar” between them and are “dissimilar” to the objects (users) belonging to other cluster(s). Based on approach, clustering can be classified as either supervised or unsupervised. Supervised uses training data or seed data to drive or control the cluster formation and unsupervised does not require any seed value or training/learning phase.
Conventional approaches rely on sampling the dataset a pre-determined number of times and generating clusters associated with the samples. These sampling-based approaches suffer from lack of replicability, since the resultant clusters are highly susceptible to bias based on the initial sampling and the number of samples. Still other clustering approaches rely on segmenting the data based on business rules. These rule-based approaches also suffer from bias based on the original selection of the business rules. User segmentation helps group users into clusters who exhibit similar requirement characteristics. Effective segmentation allows organization to focus on user's requirements in a cost effective yet exhaustive way. This helps in taking a strategic decision for a particular group of users and also it helps in identifying the type of required resources to the group and helps to have a cost estimation to provide the required framework for the clusterization.