This invention relates generally to a system and method for processing a plurality of textual documents and in particular to a system and method for identifying characteristics within the textual documents and for extracting relationships between the textual documents based on the identified characteristics.
The explosion in the number of textual documents being generated has made it increasingly important to generate an electronic version of the documents to enable automated processing to extract data, to determine information about the textual documents, and to identify relationships with other textual documents in a database. This is especially true with very large databases which may contain hundreds of thousands of textual documents, such as a database containing legal cases and other legal material.
A legal database may contain a large number of legal cases. A legal case in this document refers to an individual written decision issued in the course of a litigation. These decisions usually contain citations to and quotations from other documents, including other legal cases to establish past practice and justify the result (establish precedence) of the decision. Citations are written in distinctive styles which includes special abbreviations and punctuation which facilitate their identification. Also, quotations are usually set off in quotation mark. It is useful to identify these citations and quotations.
In some circumstances, the citations and quotations in a legal case may be identified by automatically parsing through the text of a legal case to identify candidates based upon punctuation and other characteristics. However, the punctuation which sets off a quotation may be used for other purposes, and the abbreviations and formats which characterize a citation are not necessarily unique. To ensure accurate identification, a citation or quotation must be verified. A citation has a predefined format. For example, a written decision of a California Court of Appeals case in 1993 in a lawsuit between Ms. Pleasant and Mr. Celli may have a citation such as, Pleasant v. Celli, 18 Cal. App. 4th 841, 22 Cal. Rptr. 2d 663 (1993). The first portion of the citation indicates the two parties last names and their positions in the case. For example, Ms. Pleasant is the first name listed. Since this is an appellate case, she is the appellant and Mr. Celli is the appellee. The next portion of the citation (i.e., 18 Cal. App. 4th 841 for example) indicates where a copy of the written decision may be located. The first number indicates the volume number of the case reporter in which the decision is located and the text portion (i.e., Cal. App 4th) indicates the name of the reporter and the edition of the reporter. The number following the reporter name indicates the starting page number of the decision. Thus, both the text of the citations and quotations in a written reported decision textual document have well defined format which may be automatically identified.
Once the citations and quotations in a legal case are identified and any relationships between these citations and quotations and any prior legal cases are determined, this information may be used for a variety of purposes. For example, this information may be used for both legal case verification purposes and legal case collocation purposes. Verification is a process whose end result is a determination that the legal case currently being reviewed is still good law (i.e., it has not been overruled or limited by some later case due to different reasoning). A case that is not good law may not be persuasive since the reasoning of the case is no longer valid. Thus, the process of verification ensures that the case being used to support an argument is still good law and the reasoning of that case is still valid. Collocation, on the other hand, is a process whose end result may be a list of other legal cases, legal materials or textual documents which cover similar issues to the case currently being reviewed, or a list of cases covering a particular subject matter, such as intellectual property. For example, a user of the legal database may have located a case which is of interest to him and he would then like to identify other cases that are related to the case of interest based on the subject matter. Thus, it is desirable for a user of a legal database to be able to perform both verification and collocation using a single integrated system.
Prior to electronic databases, legal cases were published in several different formats each of which have some advantages and disadvantages. A register publication gathers information at the source of the new law and presents the new law in roughly chronological order. An example of this type of publication is a reporter volume that contains the new legal cases for a particular court in chronological order. These register publications may be rapidly published and provide a good statement of the new law, but these register publications cannot provide collocation (i.e., it cannot provide a researcher with information about other cases which may be related to the subject matter of the instant case). It is also difficult to verify a legal case with a register publication. Another type of legal publication is a code publication which attempts to gather together the law that applies to a particular subject matter so that a researcher may determine the current state of the law from a single source. A code publication may permit a case to be verified since outdated law is removed as a code publication is updated. An example of a code publication is an annotated statutory code publication which gathers information from statutes and legal cases about a particular subject matter. Legal case are not published in a code format. These code publications may provide a researcher with the ability to collocate legal cases and verify a legal case, but the information in these code publications, due to the amount of time required to compile these code publications, only describe the state of the law sometime in the recent past. Therefore, for rapidly changing areas of the law, a code publication may not be accurate and therefore cannot provide either the verification or collocation needs to the researcher.
Numerous efforts have been made to codify case law, but case law tends to resist codification for several reasons. First, there is a tremendously large number of legal cases so it is often very difficult and time consuming to attempt to classify the legal cases by subject matter. In addition, legal cases have a complex hierarchy of authority and control (i.e., whether a case from a particular court controls or influences the outcome of another court, which leads to differences between various courts within various regions of the United States. It is also very difficult to determine the effects that a newly-decided legal case may have on the current state of the law. The attempts to codify the case law have included treatises (which cover the current state of the law for a particular subject matter) or restatements of the law in particular subject areas.
In order to adequately perform legal research, legal researchers need to be able to determine the current state of the case law, and then access the case law in its register form. Any access to the case law itself or free-text access cannot easily provide a current state of the law. Thus, a researcher may use an index to the case law to find the case law. These indexes include a controlled vocabulary, manually created intellectual index to case law which requires extraordinary amounts of manpower to implement. Instead of these options, a citation index (also known as a case law citator) may be used.
There are several conventional case law citators. One citator provides both the required verification and collocation functions, but has several disadvantages. In particular, this citator produces citation chains for a particular legal case which list how the legal case was treated by later legal case, such as indicating that the particular, legal case was overruled or followed. This citator is not current since it typically waited for a legal case to be in the general case law prior to generating the case law citator information. This conventional system did an adequate job of verification, but did not really provide an adequate collocation function. In particular, it is tedious to locate other case law which relates to a particular case by following the citation strings, retrieving the reporter with a legal case, and then manually reviewing the legal case.
Therefore, it is desirable to provide a system and method which seamlessly integrates general subject matter access to case law with citation information which provides a researcher with the desired verification function as well as the collocation function. Thus, the researcher may use a single integrated system which provides the researcher will all of the information that the researcher needs in a single location. The invention provides such a system, as described below.