1. Field of the Invention
The present invention relates to information search and retrieval, and a graph search.
2. Description of the Related Art
Search engines are now indispensable in daily life when searching for information on the Web. However, search results often include unnecessary results different from those expected, a consequence of searching by character string matching without consideration of semantic factors. The tendency for search results to include unnecessary results is increasing as Web data explosively increases.
A study of the Semantic Web has been conducted with an objective of finding a solution to the problem above. In the Semantic Web, primary information is extracted from a Web page as structural data referred to as metadata. This metadata is structured to enable a computer to easily understand the semantic contents. In this meaning, metadata is analogous to a database such as a relational database (RDB). Hence, different from an information search by a search engine, the Semantic Web enables a search that further includes consideration of semantic contents, thereby improving retrieval precision.
Metadata is data that has a graph (network) structure referred to as Resource Description Framework (RDF), and is written in Extensible Markup Language (XML) having a hierarchical structure. RDF searches have already been put in practice on commercial database systems.
With respect to RDF searches, some prototype systems have been released, including Jena (see “Jena Semantic Web Framework [Online] [searched on Jan. 28, 2008], Internet”) and RDFStore (see “RDFStore Perl/C RDF Storage AND API [Online] [searched on Jan. 28, 2008], Internet”), etc. These prototype systems generally employ a method of regarding a graph as clusters of triple data sets and storing the triple data sets in an RDB.
According to the triple data set method disclosed in the above literatures, a process called join is used frequently to search for a sub graph matching an inquiry graph. Join is a primary calculation used frequently in an RDB; however, combined with sorting, join also creates a heavy process load. Join, therefore, poses a problem of a slower search speed that leads to a longer search time. An explosive increase in data can be expected in the future; hence, the problem of a longer calculation time is not expected to be solved. If all the clusters are to be searched, clusters not matching the inquiry graph will also be searched, thereby lowering the search speed and leading to a longer search time.