1. Field of the Invention
The present invention relates to an apparatus and method for information search, and more particularly to an information searching apparatus and method that search documents arranged in a hierarchical structure.
2. Description of the Related Art
Information search services play an essential role in effective use of a large amount of documents on the Internet and other networks. Application programs for web search service are therefore installed in servers for portal sites or the like. Such web search applications run on a server to provide search engine capabilities that enable users to search online resources easily.
Some search engines employ a program called “robot” that continuously crawls through the network to collect document information. This type of search engines extract keywords from collected documents to construct a database, known as an “index,” containing a list of extracted keywords associated with the individual documents. When a search request is received from a client, the search engine consults the index files to find records that match with the search keywords specified in the request. If relevant records are found, the search engine sends search results back to the client, including uniform resource locators (URLs) of the found documents.
Public information available on a website is not necessarily concentrated in a single location. Rather, it is often divided in a plurality of documents written in the hypertext markup language (HTML) format, for example. Conventional search engines perform a search on individual HTML files in a website.
FIG. 24 shows an example of a conventional search technique. It is assumed in FIG. 24 that an index memory 91 of a search engine 92 contains three sets of indexes 91a, 91b, and 91c created previously for three documents that are accessible by their respective URLs, “A.html,” “B.html,” and “C.html” (domain name is omitted for simplicity). Those documents are available on a website of a company named “FTSU Limited” (which is a fictitious name for explanatory purposes). Note that A.html includes the term “FTSU,” and C.html includes the term “officers.”
Suppose now that a user has sent a search string “FTSU and officers” to the search engine 92 in an attempt to find a list of corporate officers of FTSU Limited. The reserved word “and” in this search string serves as a logical AND operator, which commands the search engine 92 to retrieve documents containing both search keywords, “FTSU” and “officers,” by consulting the indexes 91a, 91b, and 91c. In other words, the search engine 92 will not pick up those documents unless they contain every specified keyword. Because of this restricted search condition, no match is found on the website of FTSU Limited in the present example.
While the documents like A.html and C.html may have originally been a single document, they are actually stored as separate files because of some managerial needs in the website system operations. As a consequence, an AND search ends up with no hits, failing to provide the user with the needed information. The AND search would be successful if the two documents were indexed as a single combined document on the search site. However, combining multiple documents often results in a too large file for a search engine to handle efficiently. The users would therefore experience slow responses in searching and displaying documents. For this reason, combining documents is not a practical approach.
To seek more documents related to the topic, the users may use an OR search, instead of an AND search, instructing the search engine to find documents containing “FTSU Limited” or “officers” or both. However, the search results of this OR search are likely to include a large number of irrelevant hits.
Refining, or narrowing down, search results is then expected to be a better approach to achieve good outcomes. One technique for refining search results is to restrict the search. A domain search allows a user to designate a specific website domain for searching. If a domain name is specified, the search engine searches documents only within that domain. If, in the present example, the user knows the domain name of a website of FTSU Limited, he/she can specify that site as a search domain in seeking documents with a search keyword “officers.” By doing so the user can reach C.html, which is a desired search result. This type of domain search technique is used by Google.
The above-described domain search technique, of course, is only useful in the case the target domain name is known. Many of the novice search site users, however, do not know such refining techniques. They have to repeat search sessions, adding and changing search keywords, until they can reach the desired information. The search, however, would only be successful if the document includes all specified search keywords.
While it would be also possible to add keywords for each site, doing this for every site on the Internet requires an enormous amount of workload. This would impose excessive administrative burdens on the operating company of a search engine.