The volume of information in various fields has explosively increased in recent years. In some fields, the size of data is growing from the order of gigabytes to the order of terabytes, making extraction of desired data from such an enormous volume of data difficult. Data is typically saved having a hierarchical structure for easy management.
For example, to manage books in a library, hierarchies such as titles, authors, publishers, dates of publish, and lending information are used, enabling extraction of management information concerning a given book.
When desired data is extracted at high speed from an electronic document consisting of enormous amount of data and having a hierarchical structure, hierarchical retrieval and keyword retrieval are conventionally performed using separate automatons (see, e.g., Japanese Patent Application Laid-Open Publication No. 2005-070911).
For example, when a hierarchy retrieval automaton hits a hierarchical condition, a keyword retrieval automaton executes keyword retrieval. If, on the other hand, a symbol representing a hierarchy appears during execution of the keyword retrieval, the hierarchical retrieval automaton executes retrieval with respect to the hierarchical condition. These processes are carried out repeatedly.
According to the above technique, however, when an electronic document inclusive of frequently appearing symbols representing hierarchies, such as an XML document, is searched, switchover between a retrieving process using the hierarchy retrieval automaton and a retrieving process using the keyword retrieval automaton becomes highly frequent, which generates overhead, posing a problem of a drop in overall retrieval speed.
When retrieval equations for specifying a retrieval range are given to carry out simultaneous information retrieval, all retrieval equations are generated in one automaton. Consequently, when hit is made on a keyword, whether the current hierarchy satisfies a hierarchical condition corresponding to the hit keyword has to be determined. This brings about a problem in that retrieval speed drops when a keyword is hit at a high frequency.