The present invention relates to artificial intelligence and more particularly to artificial neural networks.
Artificial intelligence is the study and design of computer systems to exhibit characteristics associated with intelligence, such as language comprehension, problem solving, pattern recognition, learning, and reasoning from incomplete or uncertain information. Since the human brain is an intelligent organ, many researchers have been attempting to achieve artificial intelligence by modeling computer systems after the human brain with artificial neural network technology. Although computer scientists have been studying artificial neural network models for years and have achieve some modest successes, the full potential of artificial neural networks remains unrealized because of inherent limitations in current implementations.
An artificial neural network comprises a group of interconnected processing nodes referred to as neurons. FIG. 10A depicts a neuron 1012 in typical artificial neural network implementations, which receives a number of input signals (e.g. X1, X2, and X3) through weighted connections (W1, W2, and W3), processes the weighted input signals in accordance with an activation function (F), and produces an output signal Y in response. The input signals Xi are the data supplied by user or output signals from other neurons within the network. Each input signal Xi is multiplied by the corresponding connection weight Wi, and the output signal Y is typically the result of applying the activation function F to the sum of the weighted input signals, e.g. Y=F(xcexa3XiWi).
In most implementations, neurons are arranged in a network of two or more layers: an input layer, an output layer, and zero or more hidden layers between the input layer and the output layer. Referring to FIG. 10B, an exemplary artificial neural network 1000 comprises an input layer of neurons 1002, 1004, and 1006, a hidden layer of neurons 1012, 1014, 1016, and 1018, and an output layer of neurons 1022 and 1024.
In the illustrated example, adjacent layers are fully interconnected, and signal processing is xe2x80x9cfeedforwardxe2x80x9d in which each neuron within a given layer receives input signals from every neuron in the previous layer and transmits an output signal to every neuron in the next layer. For example, neuron 1012 in the hidden layer receives input signals from all three input layer neurons 1002, 1004, and 1006 and transmits an output signal to both output layer neurons 1022 and 1024. Accordingly, data input vectors [INPUT 1, INPUT 2, INPUT 3], equal in size to the number of neurons in the input layer, are presented to the input layer and are processed in turn by the successive hidden layers (if any) and finally the output layer. The resulting output vector [OUTPUT 1, OUTPUT 2], equal in size to the number of neurons in the output layer, is the data output of the artificial neural network.
Training an artificial neural network refers to the process of setting the connection weights so that the artificial neural network produces a desired output in response to particular inputs. Typically, the operation of the artificial neural network is divided into a training phase and an implementation phase, and the artificial neural network is not ready for use until the training is complete. Although many training techniques have been proposed, they generally fall into one of two types: supervised learning or unsupervised learning. These techniques are typically applied after the connection weights of the artificial neural network have been initialized to pseudorandom values.
With supervised learning, sets of training pairs (data input vectors and their corresponding desired output vectors) are presented to the artificial neural network for processing. When the artificial neural network produces an output vector in response, the output vector is compared with the input vector to calculate one or more error signals. These error signals are fed back into an algorithm that determines how each weight should be adjusted. Two common algorithms include backpropagation (e.g. U.S. Pat. Nos. 5,283,855, 5,566,273, and 5,870,728) and genetic algorithms (e.g. U.S. Pat. Nos. 5,140,530, 5,249,259, and 5,832,466). Other approaches have been suggested, such as U.S. Pat. No. 5,640,494 wherein individual connection weight values are perturbed, with the perturbation being rejected or retained based on whether the perturbation improves the output.
Unsupervised learning does not require feedback because input vectors do not have a priori associated output vectors. Instead, the input vectors are chosen to represent fundamental patterns, and the connections of the artificial neural network are adjusted so that the output vector represents a classification value, wherein similar future input vectors will produce the same classification value as output. U.S. Pat. Nos. 5,617,483, 5,729,662, and 5,835,901, for example, disclose various modifications of unsupervised learning.
These conventional types of artificial neural networks have fallen far short of the goal of endowing a machine with recognizable intelligence. Conventional artificial neural networks are essentially statistical correlation algorithms, relying on elaborate mathematical manipulations in their training and data processing phases. As a result, conventional artificial neural networks are basically limited to prepackaged pattern recognition applications.
For example, conventional artificial neural networks do not learn from experience, because they have distinct training and implementation phases. Weights are only modified during the training phase, but the operation of the artificial neural network occurs only in the implementation phase, during which no training is allowed and the artificial neural network is frozen at a current level of competence.
Conventional artificial neural networks are also inefficient in learning new things, when subjected to a retraining phase. Whenever a new training pair is to be learned, all of the prior training sets have to be relearned too in order not to lose previously stored capabilities. In addition, convention artificial neural network models lack a mechanism for easily modifying only relevant neurons or for shielding some neurons from further modification, because changes are either made to the entire artificial neural network or random changes are imposed indiscriminately throughout the artificial neural network.
There is a long-felt need for an artificial neural network that is more brain-like in architecture and operation so that the artificial neural network is capable of learning from experience.
These and other needs are addressed by an adaptive integration network, in which learning occurs during the normal operation of the adaptive integration network, and adaptive learning is promoted by increasing the activity level of the adaptive integration network.
In accordance with one aspect of the present invention, an adaptive integration network includes a plurality of interconnected neurons that are configured to fire when their excitation level, which is responsive to weighted input signals, is greater than or equal to a threshold. When a xe2x80x9cpresynapticxe2x80x9d neuron fires, transmits a signal to a xe2x80x9cpostsynapticxe2x80x9d neuron, and causes the postsynaptic neuron also to fire in close temporal proximity, the weight of the connection is strengthened, so that learning occurs during normal operation. In one embodiment, a connection weight strengthening function is employed that asymptotically converges to a line |wxe2x80x2|=|w|, so that higher weights are increased in their absolute values by smaller increments.
According to another aspect of the present invention, the adaptive integration network is further trained by increasing the network activity, which causes the adaptive integration network to explore other possible connection weights, until the adaptive integration network produces the desired output. Various techniques may be employed to temporarily increase the network activity, for example, by lowering the thresholds, scaling the connection weights by a positive factor, or increasing the magnitude of the neural signals or of selected external input signals.
Still other objects and advantages of the present invention will become readily apparent from the following detailed description, simply by way of illustration of the best mode contemplated of carrying out the invention. As will be realized, the invention is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the invention. Accordingly, the drawing and description are to be regarded as illustrative in nature, and not as restrictive.