The present invention relates to an apparatus for configuring a multi-layered neural network, and various application apparatuses using a multi-layered neural network, such as apparatuses for use in pattern recognition, anticipation, estimation, function approximation, and control.
A method used for pattern recognition, anticipation, estimation, function approximation, or control using a neural network is discussed in "Parallel Distributed Processing", Vol. 1, MIT Press, Cambridge, Mass., 1986, pp. 318 to 362.
A neural network will be described taking pattern recognition as an example. A neural network has neurons connected in a cascaded multi-layer manner. FIG. 2 shows an example of a three-layered neural network. In FIG. 2, reference numerals 1000 and 1001 represent input neurons, 1003 and 1004 represent hidden neurons, and 1002 and 1005 represent bias neurons. The input and bias neurons output an input data without modification, but the input/output characteristics of the hidden and output neurons are specified by the function called a sigmoid function which has saturation as shown in FIG. 3. Let input and output be x and Z, respectively, they are given by EQU Z=f(x)=1/(1+exp(-x/T)) (1)
where T is a constant defining the slope of the sigmoid function.
Sequentially number each layer from the input side and let input and output of the j-th neuron in the i-th layer be x.sub.j (i) and Z.sub.j (i), respectively. Then the input and output of an input neuron are given by ##EQU1## where n(i) is the number of inputs to the i-th layer neurons, and Z.sub.n(i)+1 (i) is a bias term.
The output of the i-th neuron of the second or third layers is given by ##EQU2## where Z.sub.n(i)+i (i) is a bias term.
The neurons between the two consecutive layers are completely connected by synapses. Each synapse has a weight specific to it. An output of each neuron is multiplied by a weight and input to the next layer neurons. Therefore, an input to the j-th neuron of the second or third layer is given by ##EQU3## where wj(i-1)=(wji(i-1), . . . , wjtn(i-1)+1(i-1)) is a weight vector, and Wjk ( i-1) is a weight of a synapse between the k-th neuron of the (i-1)-th layer and the j-th neuron of the k-th layer.
The output vector of a neuron of the (i-1)-th layer is given by EQU Z(i-1)=(Z.sub.1 (i-1), . . . , Z.sub.n(i-1)(i-1).sup.t
where t indicates the transpose of the matrix.
In the above-described neural network, it is assumed that the input data set is separated into n(3) classes. In this case, the i-th output neuron is assigned to the i-th class, and if a certain input data set causes the i-th output neuron to take "1" and the other output neurons to take "0", then the input data set is discriminated as the i-th class. In order to enable such discrimination between classes, it is necessary to determine proper weights w.sub.ij (k) (h=2, 3). To this end, a training data set consisting of the inputs and their desired outputs is used to determine weights through learning. Let m training data sets be ##EQU4## then the weights W.sub.ih (i) are determined by ##EQU5## where Z.sub.jl (3) is an output of the output neuron corresponding to the training data input x.sub.i l(1). A back propagation algorithm described in the above-cited document is widely used for determining weights. According to this algorithm, weights are modified sequentially from the output side to the input side such that the output Z.sub.jl (3) for one training data input x.sub.i l(1), i=1, . . . , n (1) becomes near s.sub.jl j=1, . . . , n (3). Thereafter, the same procedure is repeated for the next and following training data until the following inequality holds ##EQU6## where .epsilon. is a small positive number used for determining convergence.
After determining the weights of the neural network in this manner, outputs of the neural network for an input data set still not learnt are checked, thereby allowing pattern recognition.
This algorithm has a major advantage that a pattern recognition network can be configured through learning of input and output patterns, without preparing a new classification algorithm.
In the case of using a neural network for applications of prediction, estimation, or the like, there is no-significant difference from the above-described pattern recognition except that a neural network output does not take a discrete value but takes an analog value.
In configuring such a neural network, particularly in optimizing the number of hidden neurons of a multi-layered neural network the paper "Back-propagation with Artificial Selection", Technical Report NC89-104, pp.85 to 90, the Institute of Electronics, Information, and Communication Engineers of Japan, describes that the number of hidden units is optimized by dynamically adding and deleting them while learning.
Furthermore, as described in "Analysis of the Hidden Units of Back-Propagation Model by Singular Value Decomposition (SVD)" IJCNN '90-WASH-DC, 1-739 to 1-742, there is known a method of determining the number of hidden neurons by considering the rank of a matrix of weights of synapses between an input layer and a hidden layer.
The former of the above-described two conventional techniques does not describe a critical value based on which a neuron is to be deleted or not is determined tin this document, based on which a defective unit is to be deleted or not is determined). Therefore, learning is necessary each time the most defective neuron is deleted, requiring the calculation quantity similar to a conventional trial-and-error based simulation. An index indicating the convergence of such learning is also not described. Therefore, if a defective neuron is deleted from the minimum network, there arises a problem that the network will not converge.
The latter conventional technique has a restriction that the number of hidden neurons is equal to or smaller than the number of input neurons. Therefore, this technique is associated with a fatal problem that it is applicable only to a neural network of the type that information is concentrated to hidden neurons.
There is not known an effective method for optimizing the number and contents of input neurons.
Furthermore, in the case where a recognition error or significant prediction error occurs when using a network once learnt, the countermeasure against such a case is only to learn again by adding the data caused such an error to the training data set, resulting in a trial-and-error basis for improving the accuracy of recognition, prediction, or the like. Still further, the back propagation algorithm poses a problem that learning is very slow because each training data is sequentially processed.