With the increased usage of computers, networks, the Internet, etc., documents are often searched for certain terms. For example, an individual or student searching for a particular topic may search for an electronic document (hereinafter document) using a specific search engine on a networked computer, a stand-alone computer, or over the Internet for that term. The search engine will return a list of documents that contain the term using computer-based document retrieval technology. Often the documents retrieved for a query are ranked according to how well each particular document matches the queried term(s). The user often has to consider the entire document to determine where a particular search term exists.
Often, memory locations in computers store certain documents in a hierarchical structure. Certain structured computer languages, such as eXtensible Markup Language (XML) that rely on hierarchical structures use tags, or similar devices, that structurally organize data into particular sections or elements. In a retrieved structured document, the user is often not aware where the particular search term exists in each document to find context for the term in the document. This additional user time and effort may be considerable for extended queries.
Many document retrieval systems consider documents as relatively small sized discrete retrieval units that can be queried and returned, such that the documents cannot be further sub-divided. Often a retrieved document is too large for a user to analyze in a meaningful manner. Thus, users often have to carefully review entire retrieved documents in digital library computer applications to determine locations of relevant terms and/or the context of the relevant terms.
Passage retrieval is in principle similar to document retrieval, but involves the additional preliminary stage of extracting passages from documents. One aspect of passage retrieval returns briefer answers to the user. To accomplish this, a document can be decomposed into fixed-length or pre-defined portions using, e.g., the term frequency inverse document frequency (TFIDF) algorithm or a variance of this algorithms, to build index at passage or paragraph level. However, this indexing method on which many document retrieval systems rely does not maintain semantic relationships among the elements in documents. In addition, this indexing mechanism may result in many discrete retrieved elements in a form that requires considerable computer work to present meaningful text to users.