The present invention relates to digital data processing and more particularly to grouping users of a computer application or system into clusters.
Grouping users into clusters is done for a variety of purposes. To achieve user personalization, for example, one of the well known techniques, collaborative filtering, involves clustering users and recommending to a user items that other users in the user's cluster have expressed interest in. Conventionally, a user may be taken to have expressed interest in an item in various ways, e.g., by clicking on it, purchasing it, or adding it to a shopping cart. The recommendation can take a variety of forms, e.g., presenting to the user as part of search results, showing as news stories the user may want to read, identifying items the user may want to purchase, and so on.
One way to achieve user clustering is to define a distance measure between two users and then cluster them using well-known clustering algorithms like k-means or hierarchical agglomerative clustering (HAC). However, such techniques have shortcomings. For example, HAC has a running time of O(n2) which is prohibitive for n values that are hundreds of millions; and the k-means algorithm requires representing the mean of data points, which is not possible when the data points are sets.