1. Field of the Invention
The present invention is related to the field of binary classification and, more particularly, to a computer-automated system and method for the binary classification of text units constituting rules of law in case law documents.
2. Description of the Related Art
When disagreements arise about the proper interpretation of statutes, administrative regulations, and constitutions, the higher courts of our land clarify their meaning by applying established judicial criteria. A written description of this application is known as the court""s opinion. In order to understand a particular statute or provision of the Constitution, one has to see how the courts have interpreted it, i.e., one needs to read the courts"" opinions.
Every case law opinion describes the nature of the dispute and the basis for the court""s decision. Courts apply the basic methods of legal reasoning that are taught in all law schools and are used in the practice of law. Most case law documents begin with an introduction that sets forth the facts and procedural history of the case. The court then identifies the issues in dispute, followed by a statement of the prevailing law pertaining to the issue, the court""s decision on the issue, and the court""s rationale for its decision. Finally there is a statement of the court""s overall disposition which either affirms or reverses the judgment of the lower court.
In order to apply the case as precedent, one must determine the significance of the court""s decision for future litigants as well as identify the general principles of law that are likely to be applied in future cases. The holding is a statement that the law is to be interpreted in a certain way when a given set of facts exists.
Most written court opinions devote considerable space to justifying the court""s decisions. In the rationale, the court usually follows established patterns of legal reasoning and reviews the relevant provisions of the constitutions, statutes, and case law and then relates the thought processes used to arrive at the court""s judgment.
A xe2x80x98rule of lawxe2x80x99 is a general statement of the law and its application under a given set of circumstances that is intended to guide conduct and may be applied to subsequent situations having analogous circumstances. Rules of law are found in the rationales used by the court to support their decisions and often the holding is considered a rule of law.
In the prior art, ascertaining the rule or rules of law in any given decision required an individual to manually read through the text of court decisions. This is time consuming and requires the reviewing individual to read a lot of superfluous material in the effort to glean what are often just a few, pithy rules of law. Therefore, a need exists for a way to automate document review while still accurately identifying the rules of law.
Distinguishing a rule of law from text that does not constitute a rule of law requires binary classification. In the prior art, there are many statistical and machine learning approaches to binary classification. Examples of statistical approaches include Bayes"" rule, k-nearest neighbor, projection pursuit regression, discriminant analysis, and regression analysis. Examples of machine learning approaches include Naive Bayes, neural networks, and regression trees.
These approaches can be grouped into two broad classes based on the type of classification being done. When a set of observations is given with the aim of establishing the existence of classes or clusters in the data, this is known as unsupervised learning or clustering. When it is known for certain that there are N classes, and the aim is to establish a rule whereby new observations can be classified into one of the existing classes, then this is known as supervised learning. With supervised learning, a rule for classifying new observations is established using known, correctly classified data.
Rules can be established using many of the supervised techniques mentioned above. One such technique is logistic regression, a statistical regression procedure that may be used to establish an equation for classifying new observations.
In general, regression analysis is the analysis of the relationship between one variable and another set of variables. The relationship is expressed as an equation. Using the equation it is possible to predict a response, or dependent, variable from a function of regressor variables and parameters. Regressor variables are sometimes referred to as independent variables, predictors, explanatory variables, factors, features, or carriers.
Standard regression analysis, or linear regression, is not recommended for the present invention because of the dichotomous nature of the response variable, which indicates that a unit of text is either a rule of law (ROL) or not a rule of law (xcx9cROL). The reason this is true is because R2, which is used by linear regression to evaluate the effectiveness of the regression, is not suitable when the response variable is dichotomous. The present invention uses logistic regression because it uses the maximum likelihood estimation procedure to evaluate the effectiveness of the regression and this procedure works with a response variable that is dichotomous.
The training process of logistic regression operates by choosing a hyperplane to separate the classes as well as possible, but the criterion for a good separation, or goodness of fit, is not the same as for other regression methods, such as linear regression. For logistic regression, the criterion for a good separation is the maximum of a conditional likelihood. Logistic regression is identical, in theory, to linear regression for normal distributions with equal covariances, and also for independent binary features. So, the greatest differences between the two are to be expected when the data depart from these two cases, for example when the features have very non-normal distributions with very dissimilar covariances.
Several well known statistical packages contain a procedure for logistic regression, e.g., the SAS package has a logistic procedure, and SPSS has one called LOGISTIC REGRESSION.
Binomial distributions may be compared using what is known as a Z value. In statistics the so-called binomial distribution describes the possible number of times that a particular event will occur in a sequence of observations. The event is coded binary, i.e., it may or may not occur. The binomial distribution is used when a researcher is interested in the occurrence of an event instead of, for example, its magnitude. For instance, in a clinical trial, a patient may survive or die. The researcher studies the number of survivors, and not how long the patient survives after treatment. Another example is whether a person is overweight. The binomial distribution describes the number of overweight persons, and not the extent to which they are overweight.
There are many practical problems involved in the comparison of two binomial parameters. For example, social scientists may wish to compare the proportions of women taking advantage of prenatal health services for two communities that represent different socioeconomic backgrounds. Or, a director of marketing may wish to compare the public awareness of a new product recently launched with that of a competitor""s product.
Two binomial parameters can be compared using the Z statistic, where:
Z=(P0xe2x88x92P1)/(TP*(1xe2x88x92TP)(1/T0+1/T1))0.5
where Px is the probability of binomial parameter x (where x is either binomial parameter 0 or 1); TP is the combined probability of the two binomial parameters; and Tx is the sample size taken from the population(s) in order to estimate the two probabilities P0 and P1.
The same formula can be used to compare a binomial parameter from two different distributions. In this case, Px is the probability of the binomial parameter in distribution x, where x is either distribution 0 or 1; TP is the probability of the binomial parameter regardless of the distribution from which it came; and Tx is the sample size taken from distribution x, where x is either distribution 0 or 1.
A word in text creates a binomial distribution, i.e., the word either is in the text or it is not. Therefore, the above formula can be used to compare a word that appears in two distributions.
Furthermore, the above formula indicates that words with large Z values (either large positive or large negative values) have a higher probability of being in one distribution over the other. This implies that Z values can be used to a) automatically suggest words for a query, i.e., term suggestion, in an information retrieval system like Smart, and b) calculate an effective feature for a binary classification system.
The T-test is a statistical test that has been used to select terms (words) that are suggestive of a particular topic (P) of a set of documents. The T-test can be used to compare a topic (P) set of documents with a set of documents (R) randomly selected from many different topics. The interval between the occurrences of words can be selected as the basis for statistical analysis. Underlying this test is the assumption that topical (P) single words should appear more frequently and more regularly, i.e., at approximately even intervals, in the topic (P) set of documents. Therefore, terms that had this property, i.e., that appeared more frequently and more regularly in the topic (P) set of documents than in the (R) set of documents, would be the ones most suggestive of the topic P.
The formula for the T statistic is:
T=n0.5(Xxe2x88x92Xbar)/s
Where n is the number of intervals of a particular word, W, in the topic (P) set of documents; X is the mean interval of the word W in the R set of documents; Xbar is the mean interval in the P set of documents; and s is variation or standard deviation of a word in the P set of documents.
The T-test method of finding words suggestive of a particular topic (P) uses the interval between the occurrences of words while the Z value method relies on the difference in the number of times a word appears in a set of topic related documents and a set of documents from many different topic areas.
This invention is a system and method for binary classification of text units such as sentences, paragraphs and documents. Because the classification is binary, a text unit is classified as one of two classes. The preferred embodiment is a system and method for the classification of text units as either a rule of law (ROL) or not a rule of law (xcx9cROL).
During a training phase of the system and method of the present invention, an initialized knowledge base and a collection of labeled or pre-classified text units are used to build a trained knowledge base. The trained knowledge base contains an equation, a threshold, and a plurality of statistical values called Z values. This trained knowledge base is used to classify text units within the input text of any case law document as either ROL or xcx9cROL.
A Z value, which is the most effective tool in the classification process, is generated for each term or token in the input text, as hereinafter defined. The Z values are used to calculate the average Z value for each text unit. The average Z value, and possibly other features, is then input to the equation which calculates a score for each sentence. Each calculated score is then compared to the threshold to classify each text unit as either ROL or xcx9cROL.
The trained knowledge base is generated by inputting a training set of text units. In the training set, each text unit is already classified as either a ROL text unit or xcx9cROL text unit. The inputted training set is partitioned into two subsets on a random basis. The two subsets represent a regression set and calibration set. A Z value is generated for each term or token in the regression set. Then, these Z values are used to calculate the average Z value for each text unit of the regression set. Using these average Z values, and possibly other features, a linear equation is created for calculating the score for each text unit. The threshold against which each score is evaluated is selected using the generated Z values, the linear equation and the calibration set.
Using the trained knowledge base, the present invention further comprises a method of finding and marking ROL text units in an input case law document having text that has not been previously classified. Upon input of the case law document, a portion of the document is extracted. In the preferred embodiment, this portion is the court""s majority opinion. The majority opinion is partitioned into text units, and features are generated for each text unit. Features are characteristics that are representative of text units in a particular class and are helpful in distinguishing ROL text units from xcx9cROL text units.
Applying the linear equation and a sigmoid function to each text unit, a score is generated for each text unit. The scores are compared to a threshold, and text units having scores greater than the threshold are selected and marked as ROL text units. The document may then be output with the ROL text units marked.
Accordingly, it is an object of the present invention to provide a computer-automated system and method for finding rules of law in case law documents.
Another object of the invention is a computer-automated system and method for calculating a feature known as the average Z value which can be used to distinguish text units from two general classes.
A further object of the invention is a computer-automated system and method for calculating features and tokens that are effective for distinguishing rule of law text units from other text units within a case law document.
A still further object of the invention is a computer-automated system and method for selecting terms that are suggestive of a particular topic.
It is yet another object of the invention to provide a computerized system and method that will enable portions of case law documents to be categorized in an automated manner.
These and other objects of the invention, as well as many of the intended advantages thereof, will become more readily apparent when reference is made to the following description taken in conjunction with the accompanying drawings.