The present invention relates to data classification apparatus and an automated method of data classification thereof that provides a universal measure of confidence in the predicted classification for any unknown input. Especially, but not exclusively, the present invention is suitable for pattern recognition, e.g. optical character recognition.
In order to automate data classification such as pattern recognition the apparatus, usually in the form of a computer, must be capable of learning from known examples and extrapolating to predict a classification for new unknown examples. Various techniques have been developed over the years to enable computers to perform this function including, inter alia, discriminant analysis, neural networks, genetic algorithms and support vector machines. These techniques usually originate in two fields: machine learning and statistics.
Learning machines developed in the theory of machine learning often perform very well in a wide range of applications without requiring any parametric statistical assumptions about the source of data (unlike traditional statistical techniques); the only assumption made is the iid assumption (the examples are generated from the same probability distribution independently of each other). A new approach to machine learning is described in U.S. Pat. No. 5,640,492, where mathematical optimisation techniques are used for classifying new examples. The advantage of the learning machine described in U.S. Pat. No. 5,640,492 is that it can be used for solving extremely high-dimensional problems which are infeasible for the previously known learning machines.
A typical drawback of such techniques is that the techniques do not provide any measure of confidence in the predicted classification output by the apparatus. A typical user of such data classification apparatus just hopes that the accuracy of the results from previous analyses using benchmark datasets is representative of the results to be obtained from the analysis of future datasets.
Other options for the user who wants to associate a measure of confidence with new unclassified examples include performing experiments on a validation set, using one of the known cross-validation procedures, and applying one of the theoretical results about the future performance of different learning machines given their past performance. None of these confidence estimation procedures though provides any practicable means for assessing the confidence of the predicted classification for an individual new example. Known confidence estimation procedures that address the problem of assessing the confidence of a predicted classification for an individual new example are ad hoc and do not admit interpretation in rigorous terms of mathematical probability theory.
Confidence estimation is a well-studied area of both parametric and non-parametric statistics. In some parts of statistics the goal is classification of future examples rather than of parameters of the model, which is relevant to the need addressed by this invention. In statistics, however, only confidence estimation procedures suitable for low-dimensional problems have been developed. Hence, to date mathematically rigorous confidence assessment has not been employed in high-dimensional data classification.