COPYRIGHT NOTICE: A portion of the disclosure (including all Appendices) of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the U.S. Patent and Trademark Office patent file or records, but the copyright owner reserves all other copyright rights whatsoever.
1. Field of the Invention
The present invention relates to computer-assisted legal research (CALR). More specifically, the invention relates to systems and methods that identify and distinguish facts and legal discussion in the text of court opinions.
2. Related Art
Few patents and a very limited body of research literature are devoted to analysis and indexing of court decisions and case law. One reason for this phenomenon may be that the complexity of the current body of legal data overwhelms computing applications. Some applications, including artificial intelligence applications, were too ambitious and failed to follow the scientific approach of xe2x80x9cdivide and conquerxe2x80x9d: decompose a large problem into smaller ones and tackle the smaller and easier problems one at a time.
The present invention is directed to a computing method to address one of these smaller problems: identifying and distinguishing the facts and the legal discussion in a court""s legal opinion. This invention is fundamental to the improvement of the future CALR.
Factual analysis is the very first task in legal research process. A thorough analysis of facts leads to a solid formulation of legal issues to be researched. Facts are the dynamic side of law, in contrast to the relatively stable authoritative legal doctrines.
Most legal research and controversy concerns facts, not lawxe2x80x94cases are most often distinguished on the facts. The rules stated by courts are tied to specific fact situations and cannot be considered independently of the facts. The rules must be examined in relation to the facts. In this sense, the facts of a legal issue control the direction of CALR.
Applicants are aware of no patent related to distinguishing fact from legal discussion in case law documents. Most of the patents that are at all related to legal documents are in the field of information retrieval, and these patents generally do not include differentiation of facts from legal discussions in their indexing of the documents (see U.S. Pat. Nos. 5,544,352; 5,771,378; 5,832,494). Some of the patents emphasize the usage of legal concepts, not facts, in the form of headnotes, classification codes, and legal lexicons (see U.S. Pat. Nos. 5,265,065; 5,418,948; 5,488,725).
In research literature, the FLAIR project (Smith 93 and 97) attempted to separate legal content from fact, but focused heavily on legal concepts. In FLAIR, a legal lexicon is carefully constructed manually by legal experts. In this lexicon, legal concept terms are hierarchically organized by their conceptual relationships, and synonyms and alternative word forms for each lexicon term are included for completeness. FLAIR defines facts as follows: xe2x80x9cFact words are every words in the database other than concepts, statute citations, cited cases, and noise words. Fact phrases are fact words that appear next to each other with or without noise words in between.xe2x80x9d In other words, there is no specific process that specializes in identifying the facts themselvesxe2x80x94facts are merely derivatives of look-ups from the concept lexicon. Also, FLAIR""s notion of fact includes only words and phrases, and does not provide for entire passages in a court decision.
A few other research projects share the lexicon approach adopted in FLAIR, now referred to as xe2x80x9cconceptual legal information retrievalxe2x80x9d (Hafner 87; Bing 87; Dick 87). These research techniques are generally domain-specific, small-scale applications.
Some research techniques that do process facts apply case-based reasoning (CBR) technologies to legal data (Rissland 87, 93, 95; Daniels 97; Ashley 90). CBR represents a branch of artificial intelligence research and emphasizes knowledge acquisition and knowledge representation using a device known as a xe2x80x9ccase framexe2x80x9d, or xe2x80x9cframexe2x80x9d. To populate their xe2x80x9cframesxe2x80x9d, the CBR researchers analyze sample case law documents to extract, condense, and categorize facts and other relevant information into pre-defined frames. The quality of the extraction of facts, then, is limited to the quality of the design of the frames themselves; a fact that is important in one CBR frame is not necessarily important in another. This manual extraction and processing is neither repeatable nor scalablexe2x80x94a CBR project usually employs only a few dozen to a couple of hundred case law documents on a very narrow legal subject, like personal bankruptcy or contributory negligence.
A broader approach than CBR is the application of artificial intelligence (AI) to legal reasoning. In any of these computerized AI applications, facts, as in the CBR applications, play a crucial role in automatic inference. In the earlier research, the assumption is that facts are already available to help legal reasoning (Meldman 77; Tyree 81). The same assumption is made in the theoretical works (Levi 42; Gardner 87; Alexy 89; Rissland 90). How these facts are obtained was not the concern in these works. After about 1980, some researchers started creating small fact data banks for their experiments in order to build empirical evidence of effectiveness of their proposed models (Nitta 95; Pannu 95). But their approach to gathering facts from court decisions was ad hoc, and has no real potential for processing millions of decisions found in modern commercial legal databases.
A relevant research work is the SALOMON project in Belgium (Moens 97). SALOMON performs detailed analysis on criminal case decisions to programmatically identify the semantic text segments and summarize the contents within each segment. A Belgian criminal case is typically made up of nine logic segments: the superscription with the name of the court and date, identification of the victim, identification of the accused, alleged offences, transition formulation, opinion of the court, legal foundations, verdict, and conclusion. SALOMON focuses on identifying three of these nine segments: alleged offences, opinion, and legal foundations. The locating of alleged offences in a Belgian criminal case is roughly equivalent to the locating of the facts, a focus of the present invention.
SALOMON""s identification of these three segments in a decision relies on xe2x80x9cword patternsxe2x80x9d and the sequence of the segments. For example, the legal foundation segment follows an opinion segment, and might be introduced with the word pattern xe2x80x9cOn these grounds.xe2x80x9d It is unclear in the reported study, how many of the word patterns are employed in analysis and how the patterns are generated. It seems that the patterns are created manually, specific to the Belgian criminal cases. This approach is not too dissimilar from the lexicon approach used in FLAIR.
In addition, SALOMON assumes that only the text units, such as paragraphs, that appear in an alleged offense segment are related to the facts in the case. In reality, facts can appear in any part of a court decision. Even when there is a section devoted to facts, as in many U.S. and U.K. criminal cases, the facts are also embedded in the reasoning, arguments, and ruling, throughout the opinion. SALOMON makes no attempt to recognize these scattered xe2x80x9cappliedxe2x80x9d facts. In fact, it eliminates them during its summarization process after the structure of a court decision is determined through the word pattern analysis.
The process of summarization in SALOMON consists of consolidating important content texts in each of the three determined segments. It is realized through a clustering analysis of the paragraphs in one segment, and extracting the important keywords from a few large clusters because they represent important topics in the case, based on the assumption of repetitive human usage of words. The condensed texts and the extracted keywords serve as the final summary.
To summarize, Applicants are not aware of known systems that perform legal document analysis in the manner done by the present invention. Research literature discloses methods of gathering facts from court decisions, but can not be adequately scaled to handle substantial engineering applications. It is to meet these demands, among others, that the present invention is directed.
Alexy, R., A Theory of Legal Argumentation. Clarendon Press, Oxford, 1989.
Ashley, K. D., Modeling legal Argument: Reasoning with Cases and Hypotheticals, MIT Press, Cambridge, Mass., 1990.
Bing, J., xe2x80x9cDesigning text retrieval systems for xe2x80x98conceptual search,xe2x80x99xe2x80x9d Proceedings of 1st International Conference on AI and Law, Boston, pp.43-51, 1987.
Daniels, J. J. and Rissland, E. L., xe2x80x9cFinding legally relevant passage in case opinions.xe2x80x9d Proceedings of 6th International Conference on AI and Law, Melbourne, pp.39-46, 1997.
Dick, J., xe2x80x9cConceptual retrieval and case law,xe2x80x9d Proceedings of 1st International Conference on AI and Law, Boston, pp.106-115, 1987.
Gardner, A. L., An Artificial Intelligence Approach to Legal Reasoning, MIT Press, 1987.
Hafner, C. D., xe2x80x9cConceptual organization of caselaw knowledge base,xe2x80x9d Proceedings of 1st International Conference on AI and Law, Boston, pp. 35-42, 1987.
Levi, E. H., An Introduction to Legal Reasoning, University of Chicago Press, 1941.
Meldman, J. A., xe2x80x9cA structural model for computer-aided legal analysisxe2x80x9d, Rutgers Journal of Computers and the Law, Vol.6, pp.27-71, 1977.
Moens, M. F. et al., xe2x80x9cAbstracting of legal cases: The SALOMON experience,xe2x80x9d Proceedings of 6th International Conference on AI and Law, Melbourne, pp.114-122, 1997.
Nitta, K. et al., xe2x80x9cNew HELIC-II: A software tool for legal reasoning,xe2x80x9d Proceedings of 5th International Conference on AI and Law, College Park, Md., pp.287-296, 1995.
Pannu, A. S., xe2x80x9cUsing genetic algorithms to inductively reason with cases in the legal domain,xe2x80x9d Proceedings of 5th International Conference on AI and Law, College Park, Md., pp.175-184, 1995.
Rissland, E. L. and Ashley, K. D., xe2x80x9cA case-based system for trade secrets law,xe2x80x9d Proceedings of 1st International Conference on AI and Law, Boston, pp.60-66, 1987.
Rissland, E. L., xe2x80x9cArtificial intelligence and law: stepping stones to a model of legal reasoning,xe2x80x9d Yale Law Review, Vol.99, pp.1957-1981, 1990.
Rissland, E. L. et al., xe2x80x9cBankXX: A program to generate argument through case-based search,xe2x80x9d Proceedings of 4th International Conference on AI and Law, Amsterdam, pp. 117-124, 1993.
Rissland, E. L. and.Daniels, J. J., xe2x80x9cA hybrid CBR-IR approach to legal information retrieval,xe2x80x9d Proceedings of 5th International Conference on AI and Law, College Park, Md., pp.52-61, 1995.
Smith, J. C. and Gelbart, D., xe2x80x9cFLEXICON: An evolution of a statistical ranking model adopted for intelligent legal text management,xe2x80x9d Proceedings of The 4th International Conference on Artificial Intelligence and Law, Amsterdam, pp.142-151, 1993.
Smith, J. C., xe2x80x9cThe use of lexicons in information retrieval in legal databases,xe2x80x9d Proceedings of The 6th International Conference on Artificial Intelligence and Law, Melbourne, pp.29-38, 1997.
Tyree, A. L., xe2x80x9cFact content analysis of caselaw: methods and limitations,xe2x80x9d Jurimetrics Journal, Fall 1981, pp.1-33, 1981.
Hosmer, D. W.; Lemeshow, S., Applied Logistic Regression, Wiley and Sons, 1989.
Mitchell, T. M., Machine Learning, McGraw-Hill, p. 183, 1997.
The inventive system and method involve two processes: training a machine-based learning algorithm, and processing case law documents to identify and distinguish fact paragraphs and legal discussion paragraphs.
Two factors determine a successful analysis of case law document texts: the abstract learning xe2x80x9cfeaturesxe2x80x9d that facilitate machine learning, and the learning capacity of a selected learning algorithm given a set of these features. The invention provides such a set of features, and allows employment of any of a number of learning algorithms to successfully identify and distinguish fact and discussion.
In addition, a scaleable commercial application requires an automatic gathering of large quantity of training data from case law documents. The present invention provides a solution to this problem as well.
Thus, the invention provides a computer-implemented method of gathering large quantities of training data from case law documents, especially suitable for use as input to a learning algorithm that is used in a subsequent process of recognizing and distinguishing fact passages and discussion passages in additional case law documents. The method has steps of: partitioning text in the documents by headings in the documents, comparing the headings in the documents to fact headings in a fact heading list and to discussion headings in a discussion heading list, filtering from the documents the headings and text that is associated with the headings, and storing (on persistent storage in a manner adapted for input into the learning algorithm) fact training data and discussion training data that are based on the filtered headings and the associated text.
The invention further provides a method of extracting features that are independent of specific machine learning algorithms needed to accurately classify case law text passages as fact passages or as discussion passages. The method has steps of determining a relative position of the text passages in an opinion segment in the case law text, parsing the text passages into text chunks, comparing the text chunks to predetermined feature entities for possible matched feature entities, and associating the relative position and matched feature entities with the text passages for use by one of the learning algorithms.
The invention also provides apparatus for performing the methods, as well as computer-readable memories that (when used in conjunction with a computer) can carry out the methods.
Other objects, features and advantages of the present invention will be apparent to those skilled in the art upon a reading of this specification including the accompanying drawings.