A "neural network" (also called a connectionist model) is generally an electronic system (usually implemented in software but may be a combination of hardware and software) for modeling or simulating the brain. Typically neural networks are specified by describing the four attributes of architecture, propagation rules/equations, activation rules/equations and learning rules/equations.
The architecture attribute specifies the organization of neurons or nodes (the fundamental processing element or unit of a neural network) into clusters, clusters into layers, and layers into the overall neural network. For many neural networks, clusters are not used; hence, the neurons are organized directly into layers. In addition, layers may be arranged in a hierarchy. To that end, the architecture attribute also describes the "permitted" flow of information or signals within the neural network by specifying actual (or rules for) physical (specific) and broadcast (non-specific) connectivity paths between layers, clusters, and/or neurons (nodes).
Propagation rules/equations provide a detailed description, usually mathematical, of information/signal flow for every permitted physical and broadcast connectivity path specified in the architecture. This includes initial conditions and evolution over time.
Activation rules/equations provide a detailed description, usually mathematical, of how each neuron specified in the architecture processes its information (signals) to produce output information (signals). This includes initial conditions and evolution over time. In a so called "winner take all" activation, for a given set of inputs to the network neurons, one and only one neuron outputs a logical one and all other neurons output a zero. In a "many-to-many" activation, several of the network neurons generate a non-zero output.
Through learning, a neural network is trained so that application of a vector or set of inputs produces the desired (or at least consistent) set of outputs. Both output sets as well as inputs sets are referred to as vectors. Learning is usually accomplished by sequentially applying input vectors, while adjusting network internal parameters according to the learning rules/equations. During learning, the network parameters gradually converge to values that enable each input vector to produce the desired output vector. Learning is said to be either "supervised" where there is an external teacher that evaluates behavior of the network and directs weight definitions accordingly, or "unsupervised" where there is no teacher and the network self-organizes.
As such, neural networks trained by supervised training are commonly used to capture input-output functions (i.e., a pattern mapping of input patterns to output patterns). Two such applications include pattern recognition where a classification of patterns is learned and functional approximation where representations of continuous functions are learned. Whenever an input-output functionality is achieved, there is a question of reliability. The reliability of a neural network is determined by two factors: whether or not the network is being applied in a domain of the independent variables where training data were available (extrapolation), and the accuracy of the network mapping for given values of the independent variables (local goodness of fit). Measures of extrapolation and local goodness of fit have not yet been defined in the context of neural networks. Because neural networks do not automatically flag when they are extrapolating from the training data, users may inadvertently use an invalid prediction (network output). In addition, outputs of conventional neural networks do not carry explicit error bars, so the user has only a rough idea of the accuracy of a given prediction (network output).
The simplest treatment of extrapolation involves placing individual upper and lower bounds on the permissible range of each independent variable. This approach will frequently overestimate the region of validity of the network output since it assumes the independent variables have zero covariance. Any correlated distribution of the independent variables will result in underpopulated areas within the hyper-rectangle defined by the upper and lower bounds. Correlations among the independent variables can be accounted for by the Mahalanobis distance which measures the distance from a test point to the centroid of the training data scaled by the variance in the direction to the centroid. See R. O. Duda and P. E. Hart, Pattern Analysis and Scene Classification, Wiley Interscience, New York (1973). Strictly, the Mahalanobis distance is only appropriate if the independent variables follow a multivariate Gaussian distribution, and it is particularly poor at handling skewed and multi-modal distributions.
Another possible definition of extrapolation in multiple dimensions is based on the convex hull of the training data. The convex hull is the minimum volume polyhedron circumscribing the training points. See D. R. Chand and S. S. Kapur, "An Algorithm for Convex Polytopes," Journal of ACM, vol. 17, pgs. 78-86 (1978). However, if there are nonconvexities, regions inside the convex hull may be void of training data.
The second factor affecting neural network reliability is the accuracy of the network when the network is not extrapolating. Conventional techniques for training neural networks use only global measures of goodness of fit, usually in terms of the sum of squared errors (target-network output) over the training examples, a similar error measure on a test set, or percentage misclassification in pattern recognition. However, the accuracy of the network may vary over the input space because the training data are not uniformly distributed or the noise in the dependent variables is not constant. In these cases, local regions of poor fit may not be strongly reflected in the global measures of prediction error.
Accordingly, there is a need for techniques to determine reliability of a neural network. In particular, there is a need for analyzing and treating extrapolation and accuracy of a neural network.