Herebelow, numerals contained in brackets—[ ]—are keyed to the list of references found towards the end of the present disclosure.
Historically, keyword searches over tree- and graph-structured data has attracted much attention. Generally, such a simple, user-friendly query interface does not require users to master a complex query language or understand the underlying data schema. Further, many graph-structured data have no clear, well-structured schema, so many existing query languages are not applicable. All in all, the implementation of efficient ranked keyword searches, especially on node-labeled graphs, has been elusive.
Ranked keyword searches on schemaless graph-structured data pose many unique technical challenges. First, techniques developed for XML, which take advantage of the hierarchical property of trees, no longer apply. Second, lack of any schema precludes many optimization opportunities at compile-time and makes efficient runtime search much more critical.
In light of the above, conventional efforts suffer from several drawbacks. The first observation is that existing keyword search algorithms on general graphs do not take full advantage of indexing. Their only use of indexes is for identifying the set of nodes containing keywords; finding substructures connecting these nodes relies solely on graph traversal. For a system that is supposed to support a large workload of keyword queries, it seems natural to exploit indexes that provide graph connectivity information to speed up searches. Lack of this feature can be attributed in part to the difficulty in indexing connectivity for general graphs, because a naive index would have an unacceptably high (quadratic) storage requirement.
Another observation is that existing algorithms employ heuristic graph search strategies that lack strong performance guarantees and may lead to poor performance on certain graphs.
In view of the foregoing, a need has been recognized in connection with improving upon the shortcomings and difficulties of conventional efforts.