This invention relates to a learning process generating neural networks designed for sorting data and built up as a function of the needs of the task to be carried out.
Applications of this invention are in domains making use of neural networks and particularly for medical diagnosis, recognition of shapes or sorting of objects or data such as spectra, signals, etc.
Neural networks, or neuron networks, are systems that carry out calculations on digital information, inspired from the behavior of physiological neurons. Therefore, a neural network must learn to carry out tasks that will be required of it subsequently. This is done using an examples base, or a learning base, that contains a series of known examples used to teach the neural network to carry out the tasks that it is to reproduce subsequently with unknown information.
A neural network is composed of a set of formal neurons. Each neuron is a calculation unit composed of at least one input and one output corresponding to a state. At each instant, the state of each neuron is communicated to the other neurons in the set. Neurons are connected to each other by connections, each of which has a synaptic weight.
The total neuron activation level is equal to the sum of the states of its input neurons, each weighted by the synaptic weight of the corresponding connection. At each instant, this activation level is used by the neuron to update its output.
In particular, the neural network may be a layered network; in this case, the network comprises one or several layers of neurons each connected to the previous layer, the last layer being the output layer. Each configuration of network inputs produces an internal representation that is used by subsequent layers, for example to sort the input state.
At the present time there are several learning algorithms used to sort data or for shape recognition starting from neural networks, as explained for example in the document entitled xe2x80x9cIntroduction to the theory of neural computationxe2x80x9d, by HERTZ, KROGH and PALMER (1991), Adison-Wesley.
Conventionally, the algorithms are used to fix network parameters, namely values of connection weights, the number of neurons used in the network, etc., using a number of known examples of data to be sorted. These examples make up the learning base.
The most frequently used algorithm is the gradient backpropagation algorithm. This type of algorithm is described for example in the document entitled xe2x80x9cA learning scheme for asymmetric threshold networksxe2x80x9dxe2x80x94in Cognitiva, Y. LE CUN, CESTA-AFCET Ed., pages 559-604 (1985). This algorithm consists of minimizing a cost function associated with the network output quality. However this algorithm needs neurons, the states of which are represented by real numbers, even for typically binary problems such as the problem of sorting into two classes. Gradient backpropagation also requires that the number of neurons to be used is input beforehand; however there is no theoretical criterion to guide the expert in the field in determining this number of necessary neurons.
Other algorithms, called xe2x80x9cconstructivistxe2x80x9d or xe2x80x9cadaptivexe2x80x9d algorithms, adapt the number of neurons in the network as a function of the task to be carried out. Furthermore, some of these algorithms only use binary neurons, as described for example in the document entitled xe2x80x9cLearning in feed forward layered neural networks: the tiling algorithmxe2x80x9d, MEZARD and NADAL, J. PHYS. A22, pages 2 191-2 203, and in the document entitled xe2x80x9cLearning by activating neurons: a new approach to learning in neural networksxe2x80x9d, RUJAN and MARCHAND, complex Systems 3, (1989), page 229. The main disadvantage of these algorithms is that the learning rule at each neuron in the network is not efficient, such that the resulting networks are too large for the task to be solved and are not easily generalized.
The performance of an algorithm is measured by its generalization capacity, in other words its capacity to predict the class to which data that is not in the learning base belongs. In practice, it is measured using the xe2x80x9cgeneralization errorxe2x80x9d which is the percentage of data in a test base (containing known examples and independent of the learning base) sorted incorrectly by the network for which the parameters were determined by the learning algorithm. One efficient learning algorithm for a neuron is described in the document entitled xe2x80x9cLearning with temperature dependent algorithmxe2x80x9d by GORDON and GREMPEL, Europhysics Letters, No. 29, pages 257 to 262, January 1995, and in the document entitled xe2x80x9cMinimerror: perceptron learning rule that finds the optimal weightsxe2x80x9d by GORDON and BERCHIER, ESANN""93, Brussels, M. VERLEYEN Ed., pages 105-110. These documents describe an algorithm called the xe2x80x9cMinimerrorxe2x80x9d that can learn sorting tasks by means of binary neurons. This algorithm has the advantage that its convergence is guaranteed and that it has good digital performances, in other words an optimum generalization capacity.
Furthermore, this Minimerror algorithm has been associated with another constructivist type learning rule called xe2x80x9cMonoplanexe2x80x9d. This Monoplane algorithm is described in the article entitled xe2x80x9cAn evolutive architecture coupled with optimal perceptron learning for classificationxe2x80x9d by TORRES-MORENO, PERETTO and GORDON, in Esann""95, European symposium on artificial neural networks, Brussels, April 1995, pages 365 to 370. This Monoplane algorithm combines the Minimerror algorithm with a method of generating internal representations.
This Monoplane algorithm is of the incremental type, in other words it can be used to build a neural network with a hidden layer, by adding neurons as necessary.
Its performances are thus better than the performances of all the other algorithms, which means that for an identical examples base used by a series of different algorithms, the results achieved using this Monoplane algorithm are better, in other words they have a lower generalization error.
However, these Minimerror and Monoplane algorithms use neurons with a sigmoidal activation function capable only of carrying out linear separations. A network produced from this type of neuron can therefore only set up plane boundaries (composed of hyperplanes) between domains in different classes. Therefore, when the boundaries between classes are curved, these networks have a linear approximation per piece, which introduces a certain amount of inaccuracy and the need to input a large number of neurons.
However other types of algorithms are capable of paving the space with hyperspheres, each of which is represented by a neuron. For example, these algorithms are described in COOPER, REILLY and ELBAUM (1988), Neural networks systems, an introduction for managers, decision-makers and strategists, NESTOR Inc. Providence, R.I. 02906-USA, and usually end up with a very large number of neurons even when they include pruning operations. Therefore these algorithms use too many resources, in other words too many neurons and too much weight to carry out the task.
The purpose of the invention is to overcome the disadvantages of the techniques described above. It does this by proposing a learning process generating small neural networks built according to the needs of the task to be carried out and with excellent generalization. This learning process is intended for sorting of objects or data into two classes separated by separation surfaces which may be quadratic, or linear and quadratic.
More precisely, the invention relates to a process for learning from an examples base composed of known input data and targets corresponding to the class of each of these input data, to sort objects into two distinct classes separated by at least one quadratic type or quadratic and linear type separating surface, this process consisting of generating a network of binary type neurons, each comprising parameters describing the separating surface that they determine, this neural network comprising network inputs and a layer of hidden neurons connected to these inputs and to a network output neuron characterized in that it comprises:
A) An initialization step consisting of:
Aa) choosing the type of the first neuron that is connected to inputs;
Ab) learning from the examples base (B0) by this first neuron, in order to determine the descriptive parameters for a first separating surface, for this neuron;
Ac) determining the number of learning errors,
Ad) if this number of errors is zero, learning is finished and the first neuron chosen in Aa) becomes the network output neuron;
Ae) if this number is not zero, the parameters on the first neuron are fixed and the second neuron becomes the first neuron (i=1) in a layer of hidden neurons built by:
B) a step in which the hidden layer is built and the network output neuron is determined, consisting of:
B1) adaptation of the layer of hidden neurons as a function of the sorting to be done, consisting of:
B1a) determining new targets for the examples base B0 as a function of learning errors by the last neuron i learned, the inputs in the examples base Bobeing used with new targets forming a new examples base Bi;
B1b) incrementing the hidden neuron counter i by one unit, and connecting a new hidden neuron i of a chosen type on the network inputs, and learning to sort the new examples base (Bi);
B1c) fixing the parameters of this new neuron i, the states of the hidden neurons corresponding to each input data in the examples base B0 forming an internal representation of this input data;
B2) validating the layer of hidden neurons and determining the network output neuron.
According to a first embodiment of the invention, internal representations of inputs in the examples base form an internal representations base BRI (i) and step B2) consists of:
B2a) introducing a linear type output neuron, connecting this output neuron to the hidden neurons, teaching this output neuron the sort of the internal representations base BRI (i) and determining the number of learning errors in the output neuron:
B2b) if this number is zero, considering that the network is built up and includes a layer of i hidden neurons;
B2c) if this number is not zero, considering that the output neuron learning errors are errors from the previous neuron in step B1a), eliminating this output neuron and restarting the processing from step B1a) until the number of errors in the output neuron is zero.
According to a second embodiment in which the network inputs are binary, the process comprises an intermediate step B3) carried out between steps B1) and B2) and consisting of:
B3a) determining the number of neuron i learning errors;
B3b) if this number is not zero, the processing is repeated starting from step B1) assuming that the errors for this neuron are errors from the previous neuron in creating the learning base Bi;
B3c) if this number is zero, consider the layer of hidden neurons built in this way as being potentially acceptable and carry out step B2).
According to a third embodiment of the invention in which network inputs are binary, the internal representations of inputs in the examples base form an internal representations base BRI(i) and step B2), when it is carried out after step B3c), consists of:
B2d) introducing an output neuron called a pseudo-neuron, connecting this pseudo-neuron to all hidden neurons for which the parameters are fixed, calculating the output from this pseudo-neuron as being approximately the product of the states of the hidden neurons;
B2e) determining if the pseudo-neuron correctly sorts all examples in the internal representations base BRI(i);
B2f) if it does, learning is considered to be finished and the created network comprises a layer of i hidden neurons;
B2g) if not, considering that the output pseudo-neuron sorting errors are errors from the previous neuron in step B1a), eliminating this output pseudo-neuron and restarting the processing at step B1a) until the number of output pseudo-neuron errors is zero.
According to one advantageous embodiment, the first neuron chosen in step Aa) may be a linear type neuron, the other neurons being non-linear.
According to another advantageous embodiment, all hidden neurons are of the quadratic type.