The present invention relates to a method and apparatus for interpretation of data classifier outputs and a system incorporating the same.
Trainable data classifiers (for example Neural Networks) can learn to classify given input vectors into the required output group with a high degree of accuracy. However, a known limitation of such data classifiers is that they do not provide any explanation or reason as to why a particular decision has been made. This xe2x80x9cblack boxxe2x80x9d nature of the decision making process is a disadvantage when human users want to be able to understand the decision before acting on it.
Such data classifiers can be split into two main groups: those which have a supervised training period and those which are trained in an unsupervised manner. Those trained in a supervised manner (i.e. supervisedly trained data classifiers) include, for example, Multi Layer Perceptrons (MLPs).
In order for a supervisedly trained data classifier (e.g. Neural Network) to be trained, a training set of examples has to be provided. The examples contain associated input/output vector pairs where the input vector is what the data classifier will see when performing its classification task, and the output vector is the desired response for that input vector. The data classifier is then trained over this set of input/output pairs and learns to associate the required output with the given inputs. The data classifier thereby learns how to classify the different regions of the input space in line with the problem represented in the training set. When the data classifier is subsequently given an input vector to classify it produces an output vector dependant upon the previously learned region of the input space that the input vector occupies.
In the case of some very simple classification problems, the xe2x80x9creasoningxe2x80x9d made by the data classifier may, perhaps, be intuitively guessed by a user. However, Neural Networks are typically used to solve problems without a well bounded problem space and which have no solution obvious to humans. If the rules defining such a problem were clear then a rule-based system would probably provide a more suitable classifier than a supervisedly trained data classifier such as a Neural Network. A typical data classifier application involves complex, high-dimensional data where the rules between input vectors and correct classification are not at all obvious. In such situations the supervisedly trained data classifier becomes a complex, accurate decision maker but unfortunately offers the human end user no help understanding the decision that it has reached. In many situations the end user nevertheless wishes to understand, at least to some degree, why a given supervisedly trained data classifier data classifier has reached a decision before the user can act upon that decision with confidence.
In the past, much work has been directed to extracting rules from Neural Networks, where people have attempted to convert the weights contained within the Neural Network topology into if-then-else type rules [Andrews, R., Diederich, J., and Tickle, A. (1995): xe2x80x9cA survey and critique of techniques for extracting rules from trained artificial neural networksxe2x80x9d in Knowledge Based Systems, 8(6), pp.373-389]. This work has had only limited success and the rules generated have not been clear, concise, nor readily understandable. Work has also been performed which concentrates on the problem as a rule inversion problem; given a subset Y of the output space, find the reciprocal image of Y by the function f computed by the Neural Network [Maire, F. (1995): xe2x80x9cRule-extraction by back-propagation of polyhedraxe2x80x9d in Neural Networks, 12(4-5), pp. 717-725. Pub. Elsevier/pergamon, ISSN 0893-6080]. This method back-propagates regions from the output layer back to the input layer. Unfortunately, whilst this method is theoretically sound, the output from this method is once again not readily understandable to the user, and so does not solve the problem of helping the user to understand the reason for a Neural Network""s decision.
Other methods which have been tried in the past divide each individual value in the input vector into different categories (percentile bins). This technique is described in, for example, U.S. Pat. No. 5,745,654. Each percentile bin has associated with it an explanation describing the meaning of the associated individual input value rather than for the whole vector of input values. A reason is then associated with the output vector, selected as being the reason associated with the most significant input variable in the input vector. This method does not take into consideration the facts that the data classifier classifies on the input vector as a whole and that relationships between input variables are often significant. It also requires some definition of relative significance of the component variables of an input vector which is not always meaningful.
The invention seeks to provide an improved method and apparatus for interpreting outputs from supervisedly trained data classifiers.
The present invention provides to a user a textual (or other representation of a) reason for a decision made by a supervisedly trained data classifier (e.g. Neural Network). The reason may be presented only when it is required, in a manner that does not hinder the performance of the data classifier, is easily understood by the end user of the data classifier, and is scaleable to large, high dimensional data sets.
According to a first aspect of the present invention there is provided a method of operating a supervisedly trained data classifier, comprising the steps of: generating an output vector responsive to provision of an input vector; associating a reason with said classifier output vector responsive to a comparison between said classifier input vector and a previously stored association between a training vector used to train said classifier and said reason.
Advantageously, the method of operation facilitates later interpretation of the classifier outputs by a user, and is scaleable to large, high dimensional data sets.
Preferably, the method additionally comprises the step of: presenting to a user information indicative of said output vector, of said reason, and of their association.
Advantageously, the association enables the user to interpret the classifier outputs more rapidly and more directly.
Preferably, the method additionally comprises the step of: associating with said reason a measure of a degree of confidence with which said reason is associated with said input vector.
Preferably, the method additionally comprises the step of: presenting to said user information indicative of said measure of a degree of confidence.
Preferably, the method wherein said degree of confidence is calculated responsive to a comparison between said training vector and said input vector.
Preferably, said degree of confidence is calculated as a distance between said input vector and an input vector component of said training vector.
Preferably, said distance is a Euclidean distance.
Advantageously, these measures are simple to calculate and provide a good and intuitively easy to understand measure of confidence.
In a preferred embodiment, a plurality of reasons may be associated with said classifier output vector responsive to comparisons between said classifier input vector and a plurality of previously stored associations between training vectors used to train said classifier and said reasons.
Preferably, the method additionally comprises the step of: associating with each said reason a measure of a degree of confidence with which said reason is associated with said input vector.
Preferably, the method additionally comprises the step of: presenting to said user information indicative of said measure of a degree of confidence.
Advantageously, this allows the user to identify and to concentrate interpretation effort on reasons allocated a high degree of confidence.
Preferably, said information is presented sorted according to said measures of confidence.
Advantageously, this allows the user to identify reason allocated the highest degree of confidence more readily, thereby speeding the user""s interpretation of the presented information.
Preferably, each said reason is presented selectively responsive to a comparison between said measure of degree of confidence associated with said reason and a threshold criteria.
Advantageously, this allows the amount of information presented to a user to be limited, so that the user is not swamped with large numbers of potential reasons, some of which may have only been allocate a small degree of confidence.
The invention also provides a method of operating a supervisedly trainable data classifier, comprising the steps of: associating a reason with at least one training vector; training said data classifier using said training vector; providing an input vector to said data classifier whereby to generate an output vector; associating said reason with said output vector responsive to a comparison between said input vector and said at least one training vector.
In a preferred embodiment, said data classifier comprises a neural network.
According to a further aspect of the present invention there is provided a data classifier system, comprising: a supervisedly trained data classifier arranged to provide an output vector responsive to receipt of an input vector; a store containing an association between a reason and a training vector previously used to train said classifier; a data processing subsystem arranged to associate said reason with an output vector received from said data classifier, responsive to a comparison between said input vector and said training vector.
Preferably, the data classifier system additionally comprises: a computer display arranged to present an indication of said reason and an indication of said output vector to a user.
Preferably, the data classifier system additionally comprises: a data processing subsystem arranged to calculate a measure of a degree of confidence with which said reason is associated with said input vector.
Preferably, said display is arranged to present an indication of said degree of confidence to said user.
According to a further aspect of the present invention there is provided an anomaly detection system comprising a data classifier system according the to present invention.
According to a further aspect of the present invention there is provided an account fraud detection system comprising a data classifier according the to present invention.
According to a further aspect of the present invention there is provided a telecommunications account fraud detection system comprising a data classifier according the to present invention.
The invention also provides for a system for the purposes of digital signal processing which comprises one or more instances of apparatus embodying the present invention, together with other additional apparatus.
According to a further aspect of the present invention there is provided computer software in a machine-readable medium arranged to perform the steps of: receiving an input vector; providing an output vector indicative of a classification of said input vector; associating a reason with said classifier output vector responsive to a comparison between said classifier input vector and a previously stored association between a training vector used to train said classifier and said reason.
Preferably, the computer software is additionally arranged to perform the steps of: associating with said output vector a measure of a degree of confidence with which said reason is associated with said input vector.
Preferably, the computer software is additionally arranged to perform the steps of: associating a reason with at least one training vector for a data classifier; training said data classifier using said training vector; providing an input vector to said data classifier whereby to generate an output vector; associating said reason with said output vector responsive to a comparison between said input vector and said at least one training vector.
The preferred features may be combined as appropriate, as would be apparent to a skilled person, and may be combined with any of the aspects of the invention.