1. Field of the Invention
The present invention generally relates to information search and retrieval; and more particularly to a system, method and computer program products for a snippet based proximal search.
2. Description of Background
The ability to find relevant material from a large collection of documents (i.e., search or document retrieval), is a well known and long studied problem. The current approach of finding relevant material in a large collection of documents is based on submitting terms to a keyword index of the document corpus. But these indexes, once built, make a limiting assumption about the granularity of the search task. For inputs, the search index assumes that a few well chosen words or phrases represent a precise specification of the desired output. For outputs, the search index assumes that the best document is the one that in its entirety, best matches the input.
These assumption are not always accurate for every search problem. In some cases the desired input is a sentence or paragraph of text, which are not easily forced into a single Boolean query. In such cases, the natural desired output would not necessarily be the document whose entire text best matches the input text, but one in which some subsection of the document is very similar to the text input. So, the content of each document is not as important as the individual sentences and paragraphs that make up each document. The search index is basically built at the wrong level of detail to provide this information.