The present invention relates to the field of artificial neural networks, in particular it relates to a single hidden layer artificial neural network for predicting values in non-linear functional mappings.
Predicting future values of a process based on the previous known values of the process is frequently attempted using artificial neural networks. The neural network does this by modelling the short-term structure of the process.
It is usually a trivial problem to predict the next few values in a linear functional mapping; however, it is much more difficult to predict the next few values in a non-linear functional mapping.
The short term structure of a non-linear functional-mapping can be modelled by expressing the present value of the mapping sample as a function (prediction function) of the previous values of the mapping. Once the correct value of a predicted value becomes known, it is compared with the predicted value to produce an error. The error is a measure of the accuracy of the prediction function at predicting the correct value. To optimise the prediction function, the error is used to modify the prediction function. Thus, the prediction function is constantly being changed to match the short-term structure of the mapping.
One example of a prediction function is a polynomial. Polynomials can be used to approximate any non-linear continuous function to an arbitrary degree of accuracy, even if the non-linear continuous function is a so-called chaotic series.
In a non-linear chaotic functional mapping the uncertainty of the prediction increases exponentially as the value to be predicted is moved farther from the last known value. This precludes any long-term predictability. However, short term predictability is still possible based on the short-term structure of the mapping.
Previous attempts to predict the next few values in a non-linear mapping have used feed-forward neural network predictors with an input comprising a function of the previous values of the mapping and a single output. Two main types of feed-forward neural network have been used. The first type is a Multi Layer Perceptron (MLP) neural network, the second is a Radial Basis Function (RBF) neural network.
The primary difference between these two structures is that the MLP structure uses at least two hidden layers; whereas the RBF structure only uses one hidden layer. The nature of the basis or activation functions (the functions which operate on each of the inputs) of these two neural networks is also different. The MLP neural network uses Sigmoidal basis functions (which are non-zero over an infinitely large input space) as shown in FIG. 1awhereas the RBF uses Gaussian basis functions (which are localized to certain areas of input space) as shown in FIG. 1b. Experience has shown that some non-linear problems can be solved more efficiently using Sigmoidal basis functions, whereas others can be solved more efficiently using Gaussian basis functions.
An RBF network has several advantages and disadvantages compared with an MLP. An RBF network has a linear-in-the-parameters structure which means that it can use standard linear regression techniques to optimise the structure. Linear regression techniques have the relative advantages of ease of analysis and rapid learning characteristics compared with non-linear regression techniques. Non-linear regression techniques are used for non-linear-in-the-parameters structures, such as the MLP structure. Non-linear regression techniques (such as back propagation) are computationally expensive, very slow and can converge to local minimum solutions rather than the global minimum solution.
However, an RBF network has the disadvantage of requiring a prohibitively large number of Gaussian basis functions to cover high dimensional input spaces (a large number of inputs). An RBF network also needs a pre-learning stage to be performed so that the appropriate Gaussian functions (families of Gaussian functions with varying centres and widths) can be selected for a particular application.
Thus, a further disadvantage of the RBF network is that it must be tailored to each individual application; whereas the MLP network is suitable for a number of different applications because its learning strategy is more complex.
It is an object of the present invention to obviate or mitigate at least one of the above disadvantages associated with single hidden layer neural networks such as an RBF network.
This is achieved by using a single hidden layer neural network which generates trigonometric activation functions (rather than Gaussian or Sigmoidal activation functions) and linear activation functions of the inputs, and then weights the resultant activation functions produced using standard linear regression techniques.
One advantage of this invention is that it combines the accurate prediction associated with a non-linear basis function with a structure suitable for use with fast, conventional, linear regression techniques. It has some of the advantages associated with RBF networks (speed of response, simplicity) and also some of the advantages associated with the MLP network (adaptability to a number of different applications, no need for an advanced pre-learning stage, suitably for use with multiple inputs).
The present invention also has the advantage of improved non-linear predicting ability by using trigonometric activation functions which have the effect of simulating both Sigmoidal-shaped and Gaussian-shaped functions simultaneously. The conventional linear regression technique automatically adjusts the weightings for each activation function to produce the most appropriate function for modelling the particular functional mapping.
According to a first aspect of the present invention there is provided a neural network of the radial basis function type having a single hidden layer function generator and an output layer, wherein the function generator receives one or more mapping inputs and generates a plurality of terms from each mapping input, said terms including at least one trigonometric term and being free of Gaussian and Sigmoidal terms.
It will therefore be understood that the neural network comprises: at least one mapping input representing a value of a mapping; a control input representing a value to be predicted in the mapping; a single hidden layer function generator for receiving each mapping input and for generating a plurality of terms from each mapping input, including at least one trigonometric term; an adaptive weight block comprising a plurality of weight elements and a weight controller, where each weight element receives an associated term and multiplies the said associated term by a value received from the weight controller to produce an individually weighted term; an adding block for receiving each individually weighted term and for adding the individually weighted terms to produce a summed term; a comparator for receiving the summed term and the control input and for comparing the summed term with the control input to generate a difference value; an analyser for receiving the difference value, for determining the new value of each individual weight element needed to minimise the difference value, and for conveying the said new value of each individual weight to the weight controller, where the weight controller adjusts the individual weight elements accordingly; and a data output representing the predicted term of the mapping which is connected to the output of the adding block, which is the summed term.
Preferably, the function generator generates at least two trigonometric terms, one a sine term the other a cosine term.
Preferably, where more than one input is used, the hidden layer function generator includes terms resulting from the product of two or more of the inputs.
Preferably, the function generator selects at least one term from the group of terms consisting of: a zero order term, the original mapping inputs, sine functions of the mapping inputs, cosine functions of the mapping inputs, functions equalling the product of a mapping input and a sine function of-another mapping input, functions equalling the product of a mapping input and a cosine function of another mapping input, functions equalling the product of a mapping input and a different mapping input.
According to a second aspect of the present invention there is provided a method of predicting a value in a mapping using a single hidden layer neural network comprising the steps of: receiving at least one mapping input corresponding to a value in a mapping, receiving a control input corresponding to a value to be predicted in the mapping, generating at least one trigonometric term for each mapping input, weighting each term generated by an adaptable weight, summing the weighted terms to produce a sum, comparing the sum with the control input to produce a difference, analysing the difference to determine the optimum value of each adaptable weight to minimise the difference, adjusting the adaptable weight applied to each term in response to the analysis, iteratively repeating a predetermined number of times the above five steps of weighting, summing, comparing, analysing, and adjusting, where for the second and subsequent time the control input is the correct value corresponding to the predicted value of the previous iteration; and presenting the sum to an output.
Preferably, any terms which have negligible effect on minimising the difference value are pruned.