In recent years, with the development of new technologies such as triple play and new media such as a network, a large amount of graph data is generated. Graph data has two distinct characteristics. One is that a scale of a graph is large, and a scale of vertices is above a million, for example, Web information, extensive makeup language (XML, Extensive Makeup Language) data of a graph structure, and a social network; the other one is nondeterminacy of a graph structure, for example, a connection between vertices is nondeterministic. Nondeterminacy of graph data is common. For example, for personal mobile devices in a wireless ad-hoc network, there is no fixed network structure for a connection between the devices. In addition, due to impact of factors such as power of mobile devices, a connection between two devices is not always reliable but exists at a probability. For another example, proteins of any different types may interact with each other to form complex graph data. Because of a measurement error, interaction between two types of proteins that is observed in an experiment actually exists only at a probability. A nondeterministic graph model is fundamentally different from a previous graph data model. A nondeterministic graph is a probability event combination in which a vertex and an edge are used as a basic probability event, as shown in FIG. 1.
Currently, many query algorithms and index mechanisms that have high extensibility are proposed for large-scale graph data. However, these query algorithms and index mechanisms are basically based on deterministic graph structures but cannot be directly applied to a nondeterministic graph.
On the one hand, graph query is usually based on a graph structure, and basic graph query may be roughly classified into three types: (a) query based on a path in a graph, for example, reachability query and shortest path/distance query; (b) query based on a vertex in a graph, for example, nearest neighbor query; (c) query based on a sub-graph, for example, sub-graph matching and frequent sub-graph mining. On the other hand, in a nondeterministic database, each tuple is marked with a probability indicating a probability that the tuple is real, or an attribute of a tuple is expressed as a probability distribution function. A possible world model of nondeterministic data is proposed for the nondeterministic database, and by using a core “possible world model” of query processing of the nondeterministic data, a large quantity of graph instances with a deterministic structure can be derived from a nondeterministic graph. These graph instances with a deterministic structure are distributed according to corresponding probabilities and a sum of the probabilities is 1, as shown in FIG. 2.
A nondeterministic graph is constituted by a large quantity of small graphs, and in scale, each small graph has hundreds of vertices or thousands of edges at most. To make nondeterministic graph query have a probability guarantee that passes a possible world demonstration, graph searching needs to be performed on exponential graph instances derived from a nondeterministic graph, which has an unacceptable overhead, and a cost of query processing is very high.
Therefore, how to implement quick query of a nondeterministic graph becomes a to-be-solved technical problem.