Applicant classification is performed for a plurality of reasons, such as credit scoring, college applicant selection, scoring of compliance examinations and the like. In such activities, data mining is often performed to extract relevant data from information submitted by a plurality of applicants. After the data is extracted, a classifier can be used to provide an ordered classification of states. In particular, a classifier that generates a monotonic classification of states is desirable. A monotonic dataset is one that exhibits consistency such that for all i, j if Xi≦Xj, ci≦cj, where Xi and Xj represent feature vectors and ci and cj represent classes to which Xi and Xj pertain, respectively.
For example, if all measured data for applicant A exceeds the measured data for applicant B and the list of potential outcomes are monotonically classified, applicant A will obtain a higher rating or score than applicant B. An exemplary dataset used to determine credit-worthiness might include, for example, three features (represented as a feature vector Xi) for each individual: (i) income (in thousands of dollars), (ii) number of years of home ownership, and (iii) number of years with current employer. Moreover the dataset might include, for example, three ordered classifications for credit-worthiness (represented as a class ci): (i) low, (ii) medium, and (iii) high. A monotonic data set may be created for particular feature vectors and classes. For example, an exemplary monotonic dataset may include the following features and resulting classifications: {(X1, c1)=(50, 2, 5; medium), (X2, c2)=(70, 10, 25; high), (X3, c3)=(65, 0, 7; low), (X4, c4)=(40, 30, 35; high)}. For the exemplary feature vectors and classes, when Xi≦Xj, ci≦cj as well, such as for (X1, c1) and (X2, c2).
However, conventional statistical classifiers, such as neural networks or classification trees, do not guarantee monotonicity, even when trained on data that is monotonic. Violation of monotonicity can expose reviewers using classifiers to liability for inexplicable scoring decisions.
Conventional monotonic classifiers have been generated using a learning set of feature vectors and their true classes (i.e., L={(X1, c1), . . . , (Xn, cn)}), which are not required to be partially ordered. Such monotonic classifiers may estimate an increasing classifiers such that f(X) is an estimate of the class c of X. f must be exactly increasing to provide monotonic classification. One method for creating a monotonic classifier is to examine all increasing classifiers in order to determine which classifier has the least estimated error on the learning set. Methods of building monotonic classification trees are disclosed in R. Potharst and A. J. Feelders, “Classification Trees for Problems with Monotonicity Constraints,” SIGKDD Explorations, 4:1-10 (June 2002); R. Potharst, et al., “Monotone Decision Trees,” Erasmus University, Rotterdam, The Netherlands (August 1997); A. Ben-David, “Monotonicity Maintenance in Information-Theoretic Machine Learning Algorithms,” Machine Learning 19:29-43 (April 1995); J. Sill and Y. S. Abu-Mostafa, “Monotonicity Hints” Neural Information Processing Systems Foundation 9:634-40 (1997).
One problem with determining an appropriate increasing classifier is that an exhaustive search might be required. What is needed is a classification method that generates a monotonic classifier, but does not require an exhaustive search of increasing classifiers.
A need exists for a monotonic classifier that uses a conventional classification method not requiring specialized algorithms.
A further need exists for a monotonic classifier that reduces the time needed to perform the calculations used to reach its classification determination.
The present disclosure is directed to solving one or more of the above-listed problems.