A social networking service usually has a large user group, and users communicate with each other and share things with each other, and therefore, many groups are formed. Because different users have different interests, groups formed by them have different preferences, such as a “basketball” group, a “housing estate” group, and a “yoga” group. It is quite difficult for a user to find, from massive data, a user having a similar interest or a group having a similar preference. Therefore, a clustering method that can automatically categorize users having a same interest or groups having a similar topic is needed.
In a traditional clustering method for categorizing users or groups, each piece of user information or group information is represented as a space vector by using a 0/1 representation method (that is, for feature information corresponding to each piece of user information or group information, if a segmented word exists in the feature information, a corresponding vector value used for representing the segmented word is set to 1; otherwise, a vector value used for representing the segmented word is set to 0), where a dimension of the space vector is the total number of words in all features; then, clustering analysis is performed based on the space vector of the feature information by using a Vector Space Model (VSM) of a classifier.
Because there are hundreds of millions of pieces of user information and group information, and a dimension of a space vector is very large, time complexity and space complexity in computation is very large, and even the processing efficiency and performance of the VSM may be severely affected.