Document research involves identifying relevant subject matter or concepts within a document or set of documents. Search engines, for example, use “key” words or phrases as search arguments to locate text passages containing those words or phrases. The passages may or may not be relevant, however, regardless of the instance of the argument. Finding relevant subject matter involves not just the instance of the word or phrase, but the context in which it is found. The preceding and succeeding words that surround a keyword in a passage influence the meaning or effect of its use.
Sometimes the search for context, as opposed to an instance of a keyword, can be narrowed by using additional descriptive terms. Boolean operators are used by almost all search engines to link words separated by the operators in some logic set. For example, the operator “AND” implies the set of all instances of word number one used in conjunction with word number two; the operator “OR”, by contrast, implies the set of all instances of word number one combined with the set of all instances of word number two. In mathematical language, the first set is an intersection set and the second, a union set.
Wildcards, indicated by some symbol like “*” or “$”, can be used to substitute for letters, prefixes or endings, thereby picking up the alternative forms in which a word might appear. Proximity indicators, such as “ADJ”, “NEAR”, “WITH” and “SAME”, are used together with Boolean operators to indicate how far apart two words may appear in a text passage. This gives the document researcher a means for assessing context. Two words used in the same sentence, or in the same paragraph, can indicate a contextual nexus.
In the current state-of-the-art, finding contextual meaning involves reading whole passages or entire documents where keywords are located. Since the quality of document research is defined in the negative as not missing any relevant passages in a field of inquiry, the researcher can ill-afford to simply spot-read. Search engines can find the keywords, but it is the reading task that defines not only the quality but the time spent on a properly conducted search exercise. Any artifice which reduces reading time without compromising quality becomes highly desirable for productivity reasons.
U.S. Published Application No. 20050210042 to Goedken shows methods and apparatus to search and analyze prior art. Goedken shows the benefit of grouping conceptually related words to a single color, and then highlighting those words in the text of a patent document. Goedken also recognizes the benefit of counting elements for reporting purposes (see FIG. 14a). Goedken, however, does not show a system for rapidly displaying the text of a document alongside an indexed color coded chart for allowing quick navigation and quickly showing the user prevalence of various concepts inside of a document. These are important shortcomings because the patent researcher requires a system for acquiring an initial understanding of a document in 1-2 seconds. The patent researcher must view thousands of documents in a typical search, and if the initial document inquiry takes more than a few seconds, then a patent search can become economically unfeasible.
U.S. Published Application No. 20060156222 to Chi shows a method for automatically performing conceptual highlighting in electronic text. Chi has also noticed that conceptually related words can be grouped together and highlighted the same color. However, Chi has not provided for additional features that enable rapid initial understanding of a document. For example, Chi doesn't teach methods of removing passages of no relevance to the reader's interest. In addition, Chi doesn't show methods of removing all but the most relevant passages. Moreover, Chi also doesn't show a method of providing rapid understanding (1-2 seconds) of a document, such that a researcher can make the quick decision of whether or not to start reading a document.
U.S. Pat. No. 7,194,693 to Cragun shows an apparatus and method for automatically highlighting text in an electronic document. However, highlighting is determined by user preferences and scroll speed. Cragun does not show features that allow rapid, staged understanding of a document that are required by the researcher wrestling with large numbers of long documents.
U.S. Pat. No. 6,823,331 to Abu-Hakima shows a concept identification system and method for use in reducing and/or representing text content of an electronic document. Although Abu-Hakima provides for counting and ranking, there are no tools for rapid understanding of the document once it is presented.
U.S. Published Application 20090276694 to Henry shows a System and Method for Document Display. Like the present invention, Henry has found the usefulness in presenting reference characters along with names on or near the figures to which they relate. However, Henry has not taught a search system where the reference characters are rapidly located for the searcher, and presented for quick navigation through the document. Moreover, Henry has decided to retrieve characters from drawings, where the present invention contains a method for hunting patent text for reference characters.
U.S. Published Application 20040113916 to Ungar shows a perceptual-based color selection for text highlighting. The text color choice is based upon factors such as the total amount of highlighted display.
Several problems still exist in prior art. First, most search systems rely on a researcher to limit a document set using a combination of keyword and classification. But since a researcher is looking for multiple concepts simultaneously, limiting a search with a set of keywords will inevitably miss references showing the concepts that were not part of the immediate search. This is exacerbated when a searcher is looking for ten or more concepts simultaneously. Clearly, a better system would involve reviewing large sets of documents for all concepts simultaneously. However, the labor involved in reading large sets of long documents makes this approach impractical. Therefore, a system is required that enables rapid manual review of large sets of lengthy documents for multiple concepts simultaneously.
Embodiments of the present invention address many of the shortfalls in the prior art while presenting, what will hereinafter become apparent to be, a pioneering document analysis technology.