With the advent and proliferation of electronic storage of documents, particularly in networked environment, more and more documents are written, exchanged, modified, and stored. Because of the overwhelming volume of documents that are available to a user, finding a particular document of interest to the user can be very difficult. Therefore, search engines have been developed for locating and retrieving relevant documents. Generally, search engines locate documents through full text searching or through metadata-based searching. In a full text mode, a search engine locates all documents within a specified database that contain the search term(s) specified by the user. In contrast, with metadata-based searching, the search engine looks only for the occurrence of the user's search term(s) in metadata records about documents in the database.
Full text searching tends to be overinclusive and often returns too many irrelevant results. One approach to mitigate the overinclusive nature of full text searching is to use ranking methods, such as, for example, Google's® PageRank® method. However, even ranked results often contain too many unsuitable hits in the top positions, sometimes as a result of the ongoing manipulation of search hits.
Metadata-based searching provides fewer and generally more relevant search results, but metadata-based searching requires that the contents of a document are described appropriately with relevant metadata tags. However, even when documents are appropriately described, metadata-based has limitations because the metadata used to describe a large document might describe only the main themes and topics of the document but not information about finer-grained details of the documents. Thus, metadata-based searching often is inadequate for locating information in individual parts of a document.