1. Field of the Invention
This invention relates to electronic communications and, in particular, computer-implemented systems and methods to support large social networks.
2. Description of the Related Art
The sharing of information among users of social networks has developed into a worldwide phenomenon, supported by various different social network facilities. Millions of text, picture, video and audio communications are sent and received on a daily basis among users of such networks.
The various relationships among users, or profiles, on such networks are typically represented by social graphs. As the use of social networks has become more widespread, these social graphs have become structures of immense size and complexity.
In social network applications, it is often desirable to determine certain relationships between profiles and to group certain profiles together. Not only can it be helpful to determine which profiles are directly connected to one another, it can also be beneficial to determine which profiles are indirectly connected within a certain number of adjacent vertices. For example, it may be important for purposes of allowing access between two social network users to determine that they are fairly closely connected, e.g. one has a friend whose friend is also a friend of the other. Still further, it may be desirable to group certain profiles together in a manner that is meaningful to a particular profile (or user). Some relationships are more meaningful to a user than others. For example, a person who works in an office may want to know which other people in the office are friends of his own office friends, but he may be less interested in people outside the office who are friends of his office friends. Even though both types of people have the same degree of separation, only the office friends may be appropriately considered implicit neighbors. Likewise, someone in an office might like to have a simple way to automatically send information to all of his friends who also work in his office. With large social networks, determining such neighbors on a social graph can be an extremely difficult problem that requires significant processor overhead.
Historically, there has been relatively little work done on clustering users, or nodes, in a social graph. Known clustering techniques from web graphs and citation graphs address different problems and are not necessarily well suited to addressing social graphs. One specific challenge is that noise, in the form of trivial nodes that do not fall into any particular cluster, tends to reduce scalability in attempts to identify and process clusters. Eigen value decomposition methods that might address noisy data typically require a bound on the number of clusters to be processed and therefore do not lend themselves to scalable solutions very well, at least without approximations or message passing among different threads that would impose undesirable overhead in the processing.
Other techniques, such as clustering based on Jaccard similarity or variations of TF-IDF (term frequency-inverse document frequency) are not desirable approaches either, because the problems they attempt to solve do not share certain characteristics with social graphs. For instance, document similarity processing relies on certain similarities in neighbors that is not commonly present in social graphs. Likewise, similarity of documents is invariant to the size of the documents themselves and depends on ratios, such that two documents of size 3 (words, for instance) and 2 overlapping words will be analyzed the same as two documents of size 30 words with 20 overlapping.
As a result of these challenges, there remains a need for a system and method that can more efficiently determine such near neighbors on a social graph to allow continued usability of extremely large social network systems without requiring inordinately large processor systems and the accompanying energy use of such systems. With such a system and method, certain groupings of a user's friends can be made automatically to allow communications and general information sharing with the most relevant subset of the user's connections.