1. Field of the Invention
The invention relates in general to the field of neural networks, and in particular to a neural network architecture and method which utilizes self-adjusting connections between nodes to process information.
2. Related Art
Neural networks have been known and used in the prior art in computer applications which require complex and/or extensive processing. Such applications include, e.g., pattern recognition and image and voice processing. In these applications, neural networks have been known to provide greatly increased processing power and speed over conventional computer architectures. Several approaches to neural networking exist and can be distinguished from one another by their different architectures. Specifically, the approaches of the prior art can be distinguished by the numbers of layers and the interconnections within and between them, the learning rules applied to each node in the network, and whether or not the architecture is capable of supervised or unsupervised learning.
A neural network is said to be xe2x80x9csupervisedxe2x80x9d if it requires a formal training phase where output values are xe2x80x9cclampedxe2x80x9d to a training set. In other words, such networks require a xe2x80x9cteacher,xe2x80x9d something not necessarily found in nature. Unsupervised networks are desirable precisely for this reason. They are capable of processing data without requiring a preset training set or discrete training phase. Biological neural networks are unsupervised, and any attempt to emulate them should aspire to this capability. Of the following approaches, the Boltzmann/Cauchy and Hidden Markov models are supervised networks and the remainder are unsupervised networks.
At least eight principal types of feedback systems, also called backpropagation models, have been identified in the prior art. The Additive Grossberg model uses one layer with lateral inhibitions. The learning rule is based on a sigmoid curve and updates using a steepest ascent calculation. The Shunting Grossberg is similar, with an added gain control feature to control learning rates. Adaptive Resonance Theory models use two layers, with on-center/off-surround lateral feedback and sigmoid learning curves. The Discrete Autocorrelator model uses a single layer, recurrent lateral feedback, and a step function learning curve. The Continuous Hopfield model uses a single layer, recurrent lateral feedback, and a sigmoid learning curve. Bi-Directional Associative Memory uses two layers, with each element in the first connected to each layer in the second, and a ramp learning curve. Adaptive Bi-Directional Associative Memory uses two layers, each element in the first connected to each in the second, and the Cohen-Grossberg memory function. This also exists in a competitive version. Finally, the Temporal Associative Memory uses two layers, with each element in the first connected to each element in the second, and an exponential step learning function.
At least eight principal types of feedforward systems have been identified. The Learning Matrix uses two layers, with each element in the first connected to each element in the second, and a modified step learning function. Drive-Reinforcement uses two layers, with each element in the first connected to each in the second, and a ramp learning function. The Sparse Distributed Memory model uses three layers, with random connections from the first to the second layer, and a step learning function. Linear Associative Memory models use two layers, with each element in the first layer connected to each element in the second, and a matrix outer product to calculate learning updates. The Optimal Linear Associative Memory model uses a single layer, with each element connected to each of the others, and a matrix pseudo-inverse learning function. Fuzzy Associative Memory uses two layers, with each element in the first connected to each element in the second, and a step learning function. This particular model can only store one pair of correlates at a time. The Learning Vector Quantizer uses two layers, with each element in the first connected to each in the second, negative lateral connections from each element in the second layer with all the others in the second layer, and positive feedback from each second layer element to itself. This model uses a modified step learning curve, which varies as the inverse of time. The Counterpropagation model uses three layers, with each element in the first connected to each in the second, each element in the second connected to each in the third, and negative lateral connections from each element in the second layer to each of the rest, with positive feedback from each element in the second layer to itself. This also uses a learning curve varying inversely with time.
Boltzmann/Cauchy models use random distributions for the learning curve. The use of random distributions to affect learning is advantageous because use of the distributions permits emulation of complex statistical ensembles. Thus, imposing the distributions imposes behavioral characteristics which arise from complex systems the model networks are intended to emulate. However, the Boltzmann/Cauchy networks are capable only of supervised learning. And, these models have proven to be undesirably slow in many applications.
Hidden Markov models rely on a hybrid architecture, generally of feedforward elements and a recurrent network sub-component, all in parallel. These typically have three layers, but certain embodiments have had as many as five. A fairly typical example employs three layers, a softmax learning rule (i.e., the Boltzmann distribution) and a gradient descent algorithm. Other examples use a three-layer hybrid architecture and a gamma memory function, rather than the usual mixed Gaussian. The gamma distribution is convenient in Bayesian analysis, also common to neural network research, and is the continuous version of the negative-binomial distribution. However, the underlying process for this model is a stationary one. That is, the probability distribution is the same at time t and time t+xcex94t for all xcex94t.
Studies of language change and studies of visual and acoustic processing in mammals have been used in the prior art to identify the mechanisms of neural processing for purposes of creating neural network architectures. For example, it has been noted that mammalian visual processing seems to be accomplished by feed-forward mechanisms which amplify successes. Such processing has been modeled by calculating Gaussian expectations and by using measures of mutual information in noisy networks. It has further been noted that such models provide self-organizing feature-detectors.
Similarly, it has been noted in the prior art that acoustic processing in mammals, particularly bats, proceeds in parallel columns of neurons, where feed-forward mechanisms and the separation and convergence of the signal produce sophisticated, topically organized feature detectors.
The invention in its preferred embodiment provides a method and apparatus for using a neural network to process information wherein multiple nodes are arrayed in multiple layers for transforming input arrays from prior layers or the environment into output arrays for subsequent layers or output devices. Learning rules based for reinforcing successful matches to templates, and simultaneously suppressing unsuccessful matches, are applied. Interconnections between nodes are provided in a manner whereby the number and structure of the interconnections are self-adjusted by the learning rules during learning. At least one of the layers is used as a processing layer, and multiple lateral inputs to each node of each processing layer are used to retrieve information.
The invention provides rapid, unsupervised processing of complex data sets, such as imagery or continuous human speech, and a means to capture successful processing or pattern classification constellations for implementation in other networks. The invention includes application-specific self-adjusting multi-layer architectures that employ template-based learning rules to alter and annotate data arrays. Such application-specific architectures include a textual or oral language parser, a basic file searcher, an advanced file searcher, an advanced file searcher that can search for propositions, a translator, a basic xe2x80x9csmartxe2x80x9d scanner, an advanced xe2x80x9csmartxe2x80x9d scanner or oral parser, and a dialect parser for oral language. These applications have a number of common features. Inputs are delivered into a flexible number of channels, determined by the total number of scanned letters, input phonemes, words, or propositional values in the input sample. The arrays in each channel are then combined in patterns that depend on values derived from lookup tables or templates. Feedback from one or two layers higher in the central processing segment of the architecture further alters the arrays, where the learning rules reinforce template matches and also decrement failures to match. The lookup or template-matching steps also alter the arrays to prevent confusion of the data and to augment the information carried forward through the architecture. The output channels transform the arrays into final form as required, as well as reset the initial processing weights for the next processing cycle. These outputs take whatever format desired; they may be printed text, statistical information in digital or graphic form, oral outputs through speakers, data in a register, or inputs to other processing applications.