1. Field of the Invention
This invention relates to a method and apparatus for interpreting information and particularly for information relating to a communications network.
2. Description of the Prior Art
In the telecommunications field, large amounts of data are available, for example about customer behaviour and telephone usage. This data contains potentially useful information for many purposes such as detection of fraud, marketing, billing, maintenance planning and fault detection. However, the data must first be analysed in order to extract features that can easily be used for a given task. This task of extracting useful features from the data is often difficult because the user does not know which type of features to look for. For example, the information may be in the form of call detail records (CDRs). A CDR is a log of an individual telephone call which contains information such as the length of the telephone call, the customer account number, the type of call and many other pieces of information. Over a given time period many CDRs will be recorded, each containing many different pieces of information. When faced with this mass of information it can be difficult to know what features to extract for a particular problem.
One possibility is to use a data classifier which searches for a set of classes and class descriptions that are most likely to explain a given data set. Several types of such data classifiers are known. For example, Bayesian classifiers, neural network classifiers and rule based classifiers. For a given task, a classifier is typically trained on a series of examples for the particular task. After the classifier has been trained then new examples are presented to it for classification. The classifier can be trained either using a supervised method or an unsupervised method. In a supervised method the training examples that are used are known examples. That is the user knows which classes these training examples should be classified into and this information is also provided to the classifier during the training phase. For unsupervised training, there is no information about the desired classes for the training examples.
One problem is that the output of classifiers is often difficult to interpret. This is especially the case when unsupervised training has been used. The classifier output specifies which of a certain number of classes each input has been placed into. The user is given no explanation of what the classes mean in terms of the particular task or problem domain. Neither is the user provided with any information about why a particular input has been classified in the way that it has.
Previously, users have needed to carry out complex analyses of the classifier in order to obtain these kinds of explanations. Known examples can be input to the classifier and the outputs compared with the expected outputs. However, in order to do this known examples must be available and this is often not the case. Even when known examples can be obtained this is often a lengthy and expensive procedure.
A further problem is that because these kinds of explanations are not available the user's confidence in the system is reduced. This means that the user is less likely to run the system, thus reducing the value of such a system. Also, errors and mistakes are hard to detect. For example, if erroneous data is entered by mistake a resulting error in the output could easily go unchecked. Similarly, if the training examples were not representative of the example population for the particular task then errors would be produced that would be hard to find.
It is accordingly an object of the present invention to provide an apparatus and method for interpreting information relating to a communications network which overcomes or at least mitigates one or more of the problems noted above.