1. Field of Invention
The present invention relates to distributed processing architectures for neural networks.
2. Description of the Background Art
Artificial neural networks are computational systems that provide for distributed representation of information. Typically, a neural network comprises a number of layers of neurons (nodes), including an input layer, a number of hidden layers, and an output layer. Conventionally, all of the neurons in a given layer are fully interconnected with all of the neurons in the next layer. The connection between two neurons provides the output of one neuron as the input to the other. A neuron takes the sum of input connections, applies a transfer function to the summed input, and generates an output to the next layer of neurons. In a software implemented neural network, the neurons are represented by data structures in memory that store their identity, transfer function, and connections to other neurons.
Connections between neurons are weighted. The connections between layers of neurons are typically described by a matrix, W[I][J], where I indexes the neurons of the first layer and J indexes the neurons of the second layer, and where w[i][j] is the weight of the connection between the i""th neuron in the first layer and the j""th neuron in the second layer. This type of weight matrix is very convenient for fully connected neural networks because the identity of the neuron in each layer is just its index into the matrix.
This matrix approach may be used for fully or sparsely connected layers. A sparsely connected network has each neuron in a given layer connected to a relatively small percentage of the neurons in another layer. In that case most of the connection strengths are zero. However, for very large neural networks, the weight matrix occupies too much memory. For example, in a neural network with 9 layers, each having 12,500 neurons, and each layer sparsely connected (at a rate of 0.1%) to its neighbor with feed forward and feedback connections, there would only be 2.5 million connections, but a matrix approach would require 10 GBytes of memory to store the connection matrices for all of the layers.
Instead of this wasteful practice, the connections between neurons may be stored as a data structure that keeps the connection strengths in either target lists or source lists. For a source list, a data structure for each neuron would list all of the neurons that is it receives connections from (the sources) and the weight of each connection. Each of these lists could be in a simple vector structure. For a target list, a given neuron lists all the neurons it sends output data to and the weight of each of these connections. The source list is the standard method because it allows the straightforward calculation of the input intensity of each neuron as the sum of the vector of its connection strengths. On the other hand, the target list data structure allows no such simple calculation. To calculate the input intensity for a given neuron, one would have to search all other neurons for connections and, if one was found and it was active, then its connection strength would have to be accumulated. Because of this difficult and inefficient computation, the target list data structure has not generally been used.
Yet another problem with conventional neural networks is that they do not realistically model actual biological neurons in their behavior for controlling or limiting the amount of inputs they can accept or outputs that they can generate. As a result, conventional neural networks do not behave in a manner directly analogous to their real world counterparts.
The present invention overcomes the limitations of conventional neural network design by providing certain implementations of a new type of neural network architecture called a cortronic neural network. A cortronic neural network comprises a plurality of regions, each region having a plurality of neurons. In one embodiment, neurons in a given region are sparsely connected with neurons in any other region, including the same region. Thus, all of the neurons in a given region will have some connections with a small number of neurons in other regions, without being connected to all of the neurons in another region or the same region. The connections between neurons are represented as target lists, where each neuron is associated with a list of the target neurons to which it provides an input, and a weight for each of these input connections.
The training of a cortronic neural network is characterized by periodic restart competitions, instead of conventional backpropagation. A restart competition is a mechanism for calculating the intensity of inputs to all of the neurons in a region or regions, determining which neurons are xe2x80x9cwinnersxe2x80x9d that will now fire (produce a new output), and adjusting the weights between the winning neurons and their active targets.
In another aspect of the present invention, the connection weights of both the inputs to a neuron and its outputs to its targets are normalized periodically. Normalizing the strengths of the connections between neurons during learning is based on the biological fact that a real neuron can only drive a finite number of neurons due to its limited chemical and electrical outputs. This fact is usually ignored in conventional learning schemes that adjust weights, such as the backpropagation algorithm for a multi-layer perceptron. Similarly, a neuron has only a finite amount of chemicals that it distributes to its input connections for receiving input signals. This limit is also usually ignored in many neural networks.
In one embodiment, the operations of a cortronic neural network are distributed in a parallel processing system between an executive computer and a number of distributed computers. The executive computer manages the distributed computers, and orchestrates the restart competitions on the various computers. The distributed computers store one or more regions or portions of regions of the neural network in their local memories, and compute intensity values from the locally active neurons, sharing this information with the executive computer and other distributed computers in order to globally update intensity values across the entire network. The distributed computers also participate in computation of information that is used to renormalize the weights.