The use of classification models for classifying a variety of elements is known. The input data from which the classification models are built, also called the training set, comprises multiple examples or records, each having multiple attributes or features. The objective of classification is to analyze the input data and to develop an accurate description or model for each class using the features present in the data. The class descriptions are used to classify future data for which the class labels are unknown. Several classification models have been proposed over the years, e.g. neural networks, statistical models, decision trees, and genetic models.
Problems associated with constructing classification models include the training set having a significant amount of class noise making it difficult to use for constructing accurate classification models. Further, this noise is not systematic. There is no methodology that always distinguishes misclassified examples from correctly classified but exceptional ones. Expert knowledge can be applied to reject examples which are obviously misclassified, implausible, or unsuitable for some other general reason. However, other techniques will have to be applied to reject somewhat plausible but erroneously classified examples.
One problem associated with classification models has to do with the volume of the data. The volume of data recorded, even over a short time period, may begin to reach the limits of storage normally allocated to learning tasks. Further, standard learning techniques have two significant limitations with regard to processing large amount of training examples. First, they are batch algorithms which discard the results of previous learning to reprocess the ever-growing number of training examples. Therefore, a large training set growing to millions of examples might need to be retained for an indefinite time (actually until storage runs out necessitating a purge of earlier data). Unfortunately, the training examples are useless outside being maintained for the purposes of learning. Second, most learning algorithms are limited to processing examples that fit into main memory. As the number of examples grows past this limit the algorithms slow to an unacceptable degree. Thus, we would like to learn this data incrementally if possible, and on a short cycle basis, then discard most or all of it. Most of the standard incremental learning methods either store most or all of the examples they process within the implementation structures of the classification models they construct or sacrifice an unacceptable degree of classification accuracy. The few learning algorithms which avoid storing training examples and maintain good accuracy, instead use a set of classification models learned incrementally from subsets of the training examples and then integrate them into a final composite classification model. However, by blending multiple classification models, these approaches make it difficult to explain the reasoning why a particular classification prediction was made.
Another problem associated with classification models involves limited analysis expertise. Economic considerations and the availability of domain or analysis expertise in any operating company dictate that only a limited number of individuals will be available to collect and analyze training examples and to manage the process of converting them into executable knowledge. However, such knowledge is still required to be produced in a timely fashion.
Still another problem associated with classification models involves varying test conditions. Learning must occur over the long term and so handle a large volume of data, but at the same time it must not average or smooth away short term or cyclic variations in the training set. The conditions under which features of a new unlabeled example were acquired must be considered in the classification process. Only those aspects of the classification model which can reliably make predictions about examples acquired under those conditions should be consulted.
Additionally, the output of the process utilizing the classification models must be accompanied by an explanation of why a particular classification model was chosen. This explanation may be evaluated by an analysis expert during the classification model construction process. Not all learning algorithms construct models which provide comprehensible explanations of their predictions. There must be some explanation about how inferred conditions were used in making a prediction.
Multiple pre-existing classification models may already exist within the context of the blackboard architecture of the system. These may include expert system models, case-based models; previous machine-learned (decision-tree) models, and so on. Along with these classification models is an arbitration expert system which includes algorithms for evaluating among the decisions provided by each of the classification models. The latest classification model must be integrated into the blackboard in such a way as to allow the arbitration expert system to choose between the prediction of the latest and the previous models.
Given a number of classification models and their predictions for a particular unlabeled example, one approach used is a form of voting, picking the prediction with the largest number of votes. Another approach is to learn a new model whose training set is based on predictions of the base models. It would be desirable to coalesce the predictions of the base models by learning the relationships between those predictions and the correct prediction, to identify a specific classification model as the one responsible for producing a final prediction and to include a simple explanation of why the prediction was made, and to revise the selected model as the models change over time.
With the foregoing background in mind, it is an object of the present invention to address the problems of constructing classification models in general and in a specific embodiment for subscriber loop fault localization. The present invention includes a mechanism for applying expert knowledge and machine-learning routines to a continuous stream of information. At appropriate intervals the method produces new fault localization knowledge in the form of decision-tree based classification and dependability models. Such knowledge is used to enhance the existing classification knowledge already available in the apparatus for fault segmentation. The presently disclosed method functions in the context of a system which employs a plurality of classification models. Some of these models may be expert system based, others case-based, and still others may be decision-tree based. Expert system and case-base models change over time through addition of new rules or cases respectively. With each cycle through the process another model is added to those already available. Each of these classification models has a particular sub-domain where it is the most reliable, and hence the best choice to use.
The present method comprises learning a set of dependability models, one for each classification model, that characterize the situations in which each of the classification models is able to make correct predictions. For future unlabeled examples, these dependability models are consulted to select the most appropriate classification model, and the prediction of that model is then accepted.