1. Field of the Invention
The present invention relates to means and methods of accurately classifying a new record and, specifically, the combination of appropriate measures necessary to evaluate the confidence of a classification assignment.
2. Description of the Prior Art
Several inventions have been made that are tangentially related to the present invention. They employ either a quantitative confidence factor or a qualitative confidence factor, often depending upon the type of data to be classified. Specifically, the quantitative factor is the probability that the correct class is indeed the class given by an algorithm. The qualitative factor is a factor that uses evidence from a set of pre-classified data to determine that the classification assigned is indeed correct, but does not employ a probability measure.
For example, U.S. Pat. No. 6,421,640 discloses a method employed for speech recognition. This method combines a linear plurality of secondary confidence measures to arrive at a primary confidence measure. By minimizing cross-entropy measures, the methods learn the parameters of the secondary confidence measures.
U.S. Pat. No. 5,251,131 discloses a method to find confidence employing a KNN classifier to provide a distance measure. Then, the ratio of the best distance measure to the sum of the distance scores of the top two classes is the confidence measure.
U.S. Pat. No. 6,192,360 employs the Mutual Information (MI) criteria for feature selection.
Each of the aforementioned approaches either looked at text attributes or considered nominal attributes of a record, but not both. In a scenario where the record to be classified is a record containing both types of attributes, looking at only one and simply ignoring the other necessarily reduces the possible accuracy of any classification tool. However, if a classification method could employ measurements for assigning and assessing the confidence of a classification that considered both the text attributes and the nominal attributes of the subject record in a meaningful way, then that method's ability to estimate the confidence of the classification of the subject record would be greater.
What was needed was a way to provide confidence estimation of a classification assignment that takes into account both text attributes and nominal attributes.