Nowadays, in the era of big data Internet, large graphs and large networks, such as a social network, the Internet, e-commerce, and a communications network, are common representation manners of data and information. A graph-based application may include a search and a recommendation. The search may be, for example, a Google search engine. The recommendation may be, for example, a friend recommendation from Facebook, a vocation recommendation from LinkedIn (LinkedIn), a film recommendation from Netflix, product recommendations from Ebay and Amazon, or a message recommendation from Twitter. Generally, a search and a recommendation are both performed based on a similarity between nodes in a graph.
For example, a social network is an important platform for sharing information between friends. More friends indicate more frequent information sharing and communication. Therefore, an important function for social network maintenance is to carry out a friend recommendation according to a similarity between nodes.
For another example, in a churn analysis of Huawei, assuming that a customer A shifts from a service of China Unicom to a service of China Mobile, China Unicom needs to understand a customer most “similar” to the customer A, consider the customer as a customer that may potentially be churned, and focus on the customer.
A method for measuring a similarity between nodes is: collecting various attributes, such as age, occupation, income, and hobbies, of all nodes, and then measuring a similarity between the nodes according to similarities between the various attributes. However, in such a method, not only a large number of customer information needs to be collected, a high requirement is imposed on storage, but also such a method may involve personal privacy information of a customer.
Another method for effectively measuring a similarity between nodes is SimRank. Currently, SimRank has been widely applied to various scenarios, for example, a recommendation system, information search, link prediction, a citation network, and a student course network. However, in a SimRank-based similarity measurement method in the prior art, calculation is directly performed according to a definition. Consequently, time complexity and space complexity are high, and the method is not applicable to a large network.