With increasing development of information and communication technologies (ICT), data generated in the Internet are exploding. A large amount of valuable information can be obtained by data mining and machine learning on the data. Research objects of data mining and machine learning are usually a set of objects and a relationship between the objects (for example, a social network). The foregoing research objects can be expressed as graphs in a mathematical sense. A graph can be used to describe relationships among objects. Intuitively, the graph may include some small dots and lines connecting the dots. The dots are referred to as vertices of the graph. The lines connecting the dots are referred to as edges.
Therefore, data mining and machine learning algorithms can be converted to operations on a graph, called graph computation. For a graph that can be operated, a data structure needs to be selected to represent the graph. Currently, there are mainly two kinds of representations of a graph: adjacency table and adjacency matrix. In an adjacency table, an object is used to represent a vertex, and a pointer or reference is used to represent an edge. This kind of data structure is adverse to concurrent processing on a graph. An adjacency matrix, referred to as a matrix herein, is a two-dimensional matrix storing adjacencies between vertices. A graph can be concurrently processed well using the matrix data structure. In addition, when data is stored in a matrix, a volume of stored data is relatively small.
Theoretically, the matrix computation in the graph computation may include a matrix-vector multiplication operation and a matrix-matrix multiplication operation. An existing matrix-vector involution operation, such as a generalized iterated matrix-vector multiplication (GIMV), is performing a pairwise combine operation on elements in a row of a matrix and elements in a vector, and after all pairwise combine operations on the elements in the row and the elements in the vector are completed, performing a global combine operation on results of the pairwise combine operations. Consequently, intermediate memory space with a size of a matrix needs to be occupied in a computation process, raising a higher requirement on a system hardware device. In addition, when matrix-vector multiplication is used in a distributed environment, a system needs to transmit a large amount of data; as a result, a large amount of time is consumed for computation.