People often need to access information that is recorded in documents. Such documents may range in length from a single page to many volumes. Certainly, the longer the documents are, the more difficult it is to access the specific information desired. As a result, long or complex documents include multiple tables and indices to facilitate hierarchical and keyword searches for the information of interest. These tables and indices are time-consuming to create. Furthermore, although the tables and indices are helpful in locating information, using them is time-consuming since each page referred to in the table or index must be individually examined to find the entry of interest. The foregoing may be repeated numerous times before the substantive entry of interest is found.
The proliferation of computers has revolutionized how information is accessed. Accordingly, computer-readable documents may now be searched using various software routines for terms of interest in the document. In particular, hypertext linking has enabled referencing and cross-referencing of key terms simply by using a pointing device to select a point of interest in a document.
Despite the great advantages that computers permit in accessing and retrieving information, processes for information retrieval still can be improved. For example, creating a hyper-link to a source document involves human intervention to identify the term with which the link will be created and associating it with the link to the related information. Further, to create hyper-linked information, all the documents or at least those documents containing the links need to be computer-readable documents. As a result, a non-computer-readable document may be scanned so that it is computer-viewable, but unless the document is computer-readable such as a text or graphics document, it is generally not possible to associate links with portions of the document.
Similarly, even though a referenced document need not be computer-readable to be accessed from a link, if the reference target is not computer-readable, then a person linking to the document may be required to manually navigate through the target document to find the information of interest. Certainly, the task becomes even more complicated if one desires information in both documents and needs to switch back and forth between the documents. In such cases, to avoid the difficulty of navigating back and forth, a user desiring portions of such documents will typically print the needed documents or parts of the documents. When users print such documents instead of accessing them on the computer, this clearly undermines one of the objectives of making such documents accessible by computer.
To avoid the complexities of moving back and forth between documents, one possibility is to extract information from documents that is expected to be relevant. Unfortunately, removing only the content from the documents may present other problems. For example, some regulatory agencies require that extracted content be verified as accurately including the content of the original document before it can be used. This verification is a time-consuming and costly process. In addition, extraction of content also may obliterate inferential information a user might otherwise obtain from the document. Such inferential information might exist as an interrelationship between parts, or as an annotation regarding other parts that might be useful, and other similarly useful information. Extracting information expected to be relevant thus may obliterate other useful information.