The core function of text-search-and-retrieval software is to find those documents in a set that contain the items specified in a query. Having found the documents that match a query, however, existing text-search-and-retrieval systems are poor at determining which of the found documents have “meaning” to the user. Various methods of “ranking” are touted by existing text-search-and-retrieval software, such as counting the number of times a particular word appears in a document, or by attaching more “weight” to unique terms. But such methods rely upon statistics, rather than the logical connotation of the words in the documents.
This inability to rank text-based search results adequately is particularly troubling in the legal research field. If traditional legal research is viewed as proceeding “from the top down,” computerized legal research proceeds “from the bottom up.” In traditional research, the attorney first consults legal encyclopedias and digests to find discussions of general legal topics, then hones the research to more discrete areas, hoping that the digest authors will provide citations to case law that they have deemed relevant. Using computerized legal research, the attorney enters a Boolean search expression that he thinks will generate cases that are relevant to the research topic, and retrieves the full text of every document that matches his search expression. However, unlike finding the same documents via traditional legal research, wherein the authors of encyclopedias and digests have only listed cases which the authors have evaluated and found valuable, existing search and retrieval software does not provide the researcher with any content-based evaluation of the found documents.
One method of “valuing” legal documents found in a search is to evaluate the citations made to these documents. American jurisprudence relies upon precedent. Judges must base their decisions upon principles set out in earlier cases, and must provide precise citations to the earlier decisions upon which they are relying. Consequently, prior art legal search engines have analyzed these citations in order to value documents and to rank search results. For instance, the CaseFinder service provided by the assignee of the present invention displays citations to other cases as hyperlinks, where one click takes the user to the text of the cited case. CaseFinder uses this hyperlinking capability to provide a better search result ranking mechanism. The extent to which a document has been cited by later documents enhances the “value” of that document.
CaseFinder shows the user a list of later cases that cite the found document. CaseFinder can arrange the documents in the list by using two kinds of relevance ranking that rely upon citations. In the first type, CaseFinder ranks documents according to the number of times they had been cited by other documents. The second type of ranking is similar, but in this instance, CaseFinder first assigns a “weight” to a citing document by determining how many times it had been cited. CaseFinder then uses the calculated weight of the citing documents, rather than a simple count of the citing documents, to rank the cited document. Unfortunately, even the CaseFinder system does not provide a perfect ranking system for legal text searchers.
In addition to the issue of ranking documents that are found by a search, prior art search engines and prior art techniques lack other important aids to legal researches. For instance, attorneys performing research may come upon a phrase in a document that seems to be of particular importance to the research task. The researcher may wish to incorporate the phrase by quoting it, or by incorporating the meaning of the phrase in his or her report. However, for the report to be credible, it is important for the researcher to know if others in the field have deemed the phrase significant. Currently, there is no method or device for rapidly making such determinations.
One method of accomplishing this is to examine later documents for quotations of material from an earlier (or “source”) document. The quotation of material from an earlier work evidences the judgment of the author of a later work that the quoted words, and/or the work that contains them, are significant. By locating and displaying words and phrases in an earlier work that have been quoted by subsequent works, and by revealing the identity of the subsequent works, the invention assists researchers in determining the value of particular words and phrases in earlier documents.
When a later decision quotes language from an earlier decision, the act of quotation itself constitutes a value judgment resulting from intellectual analysis. However, no existing digital research system recognizes the significance of quotations. What is needed is a text-based indexing and searching system that recognizes, as an indicator of the value of a source document, the fact that items of text in the source document were included as quotations by a later document.