The present invention pertains in general to neural networks, and more particularly, to methods for estimating the accuracy of a trained neural network model, for determining the validity of the neural network""s prediction, and for training neural networks having missing data in the input pattern and generating information as to the uncertainty in the data, this uncertainty utilized to control the output of the neural network.
A common problem that is encountered in training neural networks for prediction, forecasting, pattern recognition, sensor validation and/or processing problems is that some of the training/testing patterns might be missing, corrupted, and/or incomplete. Prior systems merely discarded data with the result that some areas of the input space may not have been covered during training of the neural network. For example, if the network is utilized to learn the behavior of a chemical plant as a function of the historical sensor and control settings, these sensor readings are typically sampled electronically, entered by hand from gauge readings and/or entered by hand from laboratory results. It is a common occurrence that some or all of these readings may be missing at a given time. It is also common that the various values may be sampled on different time intervals. Additionally, any one value may be xe2x80x9cbadxe2x80x9d in the sense that after the value is entered, it may be determined by some method that a data item was, in fact, incorrect. Hence, if the data were plotted in a table, the result would be a partially filled-in table with intermittent missing data or xe2x80x9cholesxe2x80x9d, these being reminiscent of the holes in Swiss cheese. These xe2x80x9cholesxe2x80x9d correspond to xe2x80x9cbadxe2x80x9d or xe2x80x9cmissingxe2x80x9d data. The xe2x80x9cSwiss-cheesexe2x80x9d data table described above occurs quite often in real-world problems.
Conventional neural network training and testing methods require complete patterns such that they are required to discard patterns with missing or bad data. The deletion of the bad data in this manner is an inefficient method for training a neural network. For example, suppose that a neural network has ten inputs and ten outputs, and also suppose that one of the inputs or outputs happens to be missing at the desired time for fifty percent or more of the training patterns. Conventional methods would discard these patterns, leading to training for those patterns during the training mode and no reliable predicted output during the run mode. This is inefficient, considering that for this case more than ninety percent of the information is still there for the patterns that conventional methods would discard. The predicted output corresponding to those certain areas will be somewhat ambiguous and erroneous. In some situations, there may be as much as a 50% reduction in the overall data after screening bad or missing data. Additionally, experimental results have shown that neural network testing performance generally increases with more training data, such that throwing away bad or incomplete data decreases the overall performance of the neural network.
If a neural network is trained on a smaller amount of data, this decreases the overall confidence that one has in the predicted output. To date, no technique exists for predicting the integrity of the training operation of the network xe2x80x9con the flyxe2x80x9d during the run mode. For each input data pattern in the input space, the neural network has a training integrity. If, for example, a large number of good data points existed during the training, a high confidence level would exist when the input data occurred in that region. However, if there were a region of the input space that was sparsely populated with good data, e.g., a large amount of bad data had been thrown out from there, the confidence level in the predicted output of a network would be very low. Although some prior techniques may exist for actually checking the actual training of the network, these techniques do not operate in a real-time run mode.
The present invention disclosed and claimed herein comprises a network for estimating the error in the prediction output space of a predictive system model for a prediction input space. The network includes an input for receiving an input vector comprising a plurality of input values that occupy the prediction input space. An output is operable to output an output prediction error vector that occupies an output space corresponding to the prediction output space of the system model. A processing layer maps the input space to the output space through a representation of the prediction error in the system model to provide said output prediction error vector.
In another aspect of the present invention, a data preprocessor is provided. The data preprocessor is operable to receive an unprocessed data input vector that is associated with substantially the same input space as the input vector. The unprocessed data input vector has associated therewith errors in certain portions of the input space. The preprocessor is operable to process the unprocessed data input vector to minimize the errors therein to provide the input vector on an output. The unprocessed data input in one embodiment is comprised of data having portions thereof that are unusable. The data preprocessor is operable to reconcile the unprocessed data to replace the unusable portion with reconciled data. Additionally, the data preprocessor is operable to output an uncertainty value for each value of the reconciled data that is output as the input vector.
In a further aspect of the present invention, the system model is comprised of a non-linear model having an input for receiving the input vector within the input space and an output for outputting a predicted output vector. A mapping function is provided that maps the input layer to the output layer for a non-linear model of a system. A control circuit is provided for controlling the prediction output vector such that a change can be effected therein in accordance with predetermined criteria. A plurality of decision thresholds are provided that define predetermined threshold rates for the prediction error output. A decision processor is operable to compare the output prediction error vector with the decision thresholds and operate the output control to effect the predetermined changes whenever a predetermined relationship exists between the decision thresholds and the output prediction error vector.
In an even further aspect of the present invention, the non-linear representation of the system model is a trained representation that is trained on a finite set of input data within the input space. A validity model is provided that yields a representation of the validity of the predicted output of a system model for a given value in the input space. The validity model includes an input for receiving the input vector with an input space and an output for outputting a validity output vector corresponding to the output space. A processor is operable to generate the validity output vector in response to input of a predetermined value of the input vector and the location of the input vector within the input space. The value of the validity output vector corresponds to the relative amount of training data on which the system model was trained in the region of the input space about the value of the input vector.
In a yet further aspect of the present invention, the system model is trained by a predetermined training algorithm that utilizes a target output and a set of training data. During training, an uncertainty value is also received, representing the uncertainty of the input data. The training algorithm is modified during training as a function of the uncertainty value.