The present disclosure relates to data processing, and more specifically, to methods, systems and computer program products for real-time clustering using multiple representatives from a cluster.
Clustering is a type of analysis where a set of objects are grouped into clusters based on a similar trait or characteristics. Clustering may require a view on all available data. For real-time application, the data may need to be clustered as it is received. However, comparing the data to all existing data structures (e.g., clusters) to find an appropriate cluster may require too much processing time and resources to make it feasible for real-time application. In some embodiments, the first data used to form the cluster is designated as the representative of the cluster to which all future data will be compared against for clustering purposes.