Modern prediction algorithms are systems that have a high degree of machine intelligence. Machine intelligence can be defined, for example, as the ability to emulate or duplicate in data processing devices the sensory processing and decision making capabilities of human beings. Intelligent systems have the ability, for example, to autonomously learn and adapt in uncertain or partially known environments. It is this feature that has facilitated the commercial acceptance of prediction algorithms.
An artificial neural network (ANN) approach to machine intelligence is based upon the study of the human brain and its emergent properties. Artificial neural networks are generally well known. Such artificial neural networks are data processing systems that have been constructed to make use of some of the organizational principles that are believed to be used by the human brain. In a generic neural network or connectionist model, for example, there are three main components: an artificial neuron, a network topology and a learning algorithm or strategy.
The artificial neurons are processing elements where most of the computation is done. The neurons receive inputs from, for example, other neurons, or from an environment by means of synapses or by interconnection pass outputs to other neurons. The processing elements of an artificial neural network are connected together and overall system behaviour is determined by, for example, the structure and strengths of these connections. A network structure for these elements consists of neurons, or processing elements. The neurons are arranged in groups or layers. Multi-layer systems contain, for example, input and output neurons layers that receive or emit signals to the environment, and neurons which form so called hidden units, which are organised in one or more so-called hidden layer(s). The hidden layers perform non-linear mappings and contribute to the complexity of reliably training a system.
The connections between neurons in different layers propagate signals in one of two ways: feed-forward signals and feedback signals. Feed-forward signals only allow information to flow in one direction. Feedback signals allow information to flow in either direction and/or recursively.
Further, each connected pair of neurons in a neural network has an associated adjustable value or weight. A weight represents the connection strength between a pair of interconnected neurons. The collective weights of all neuronal connections in a neural network are stored in a memory, such as, for example, in a weight matrix.
Learning in an artificial neural network can be defined as any change in a network's memory, or weight matrix. Training a neural network is necessary so that the network will produce a desired output for a given input. Basically, there are two kinds of training or learning of such networks, categorized as unsupervised learning and supervised learning. Unsupervised learning, or self-organization, is a process that does not involve an external teacher. Only local information and internal control strategies are relied upon. Examples of unsupervised learning are implementations of Adaptive Resonance Theory and Hopfield networks.
Supervised learning, on the other hand, relies on an external teacher, such as, for example, a training and testing database. A typical supervised learning algorithm is, for example, back propagation. In particular, supervised training consists of feeding a set of input data to a initialized ANN for which an associated set of one-to-one mapped output data is known. The output data computed by the ANN are then compared with the known output data and the error between the ANN's mapping and the known output data is calculated according to, for example, a distance function or metric. This error is then used to calculate a new weight matrix, or memory, and the training and testing steps are repeated until the desired level of fitness or certainty has been reached, i.e., the error or distance function decreases below a defined threshold.
Supervised training normally uses a training algorithm implementing some optimization techniques which are applied to change the weights or values to provide an accurate mapping. The optimization techniques generally fall within one of two categories, namely stochastic or deterministic techniques.
Stochastic techniques include evolutionary algorithms which help in avoiding leaning instabilities and slowly locate a near global optimum, i.e. a minimum in the error surface, for the weights.
Deterministic methods, on the other hand, such as the well known gradient descent technique, quickly find a minimum but are susceptible to local minima.
Other kinds of learning techniques may be generally defined as error-correction learning. One type of learning technique adjusts a connection weight matrix in proportion to a difference between desired and computed values of each neuron in the output layer. Another example of error-correction learning is reinforcement learning. This is a technique by which weights are reinforced for properly performed actions and diminished for inappropriate ones. Performance of the output layer is captured in a single scalar error value.
These different types of training techniques are disclosed, for example in U.S. Pat. Nos. 6,269,351, 5,214,746, 5,832,446. Each of these patents purports to focus on the training algorithm of an ANN, and purports to improve same.
On the other hand, U.S. Pat. Nos. 6,212,508 and 6,269,351 purport to refer to the problem of appropriate definition of a training and/or testing data set for an ANN. However, in each of these cases, the only problem considered is the selection of a duly representative training set from among a group of data records, not an optimization of a database from which a training data set can be selected.
In general training data selection is a nontrivial task. An ANN is only as representative of the functional mapping it emulates as the data used to train it. Thus, any features or characteristics of the mapping that are not included (or hinted at) within the training data will not be represented in the ANN. Selection of a good representative sample requires analysis of historical data and much trial and error. A sufficient number of points must be selected from each area of the data set that represents or reveals a new or different aspect, behaviour or property of the mapping. This selection is generally accomplished with some form of stratified random sampling, i.e., by defining the various regions and randomly selecting a certain number of points from each region of interest.
Addressing the problem, U.S. Pat. No. 6,269,351 is a system and method for selecting a representative training from a group of data records. Such methods adaptively increase the size of a training dataset during training if a training algorithm fails to reach an intermediate error goal with respect to the entire set of data records. Once an intermediate error goal is reached with respect to the entire data set, a lower error goal is then set and the training algorithm is repeated until the set error goal corresponds to a defined final training state. If not optimally done, in order to include the requisite representative data points to capture the inherent mapping rule the testing set can grow very large, necessitating increased complexity and decreasing the number of data points available for a testing set.
As well, U.S. Pat. No. 6,212,508 purports to disclose a process for conditioning the input variables to a neural network. Such method involves the formation of time series from input variables to the network, where such time series are then subdivided into intervals whose length depends on how far back in time the measured variables contained therein extend. Interval lengths are selected to be larger the further the interval extends back in time. By means of convolution using a bell-shaped function, a representative input value for the neural network is obtained from all the measured variables contained in an interval. Al input variables obtained in this way are fed to the network simultaneously during training and during operation.
None of these approaches, however, refer to the problem of optimizing the distribution of records of a common database into separate training subsets and testing subsets. One of the most difficult problems faced with when training an Artificial Neural Network (ANN) is establishing the size and quality of the training and testing sets. Most of the time, the preset available data set is either too small or too complex to simply be divided into two subsets according to some pseudo-random criterion as is commonly done in known training and testing procedures.
Accordingly, a random distribution of a data set into two or three subsets only makes sense if it is assumed that a simple function represents the overall data set in an optimal way. Generally, however, data are discrete hyper-points of some unknown non-linear function, and this assumption fails.
Furthermore, a pseudo random distribution of all of the available data into a training set and a testing set does not take into account the problem of outliers. The unknown non-linear function can be approximated by a prediction algorithm such as, for example, an ANN.
Thus, a need exists for a method and system to optimize a database for the training and testing of prediction algorithms so as to be able to best approximate an unknown nonlinear function or mapping.