1. Field of the Invention
The invention relates in general to field of neural networks, and in particular to a neural network architecture and method which utilizes Bose-Einstein statistics or Polya modeling to capture a process.
2. Related Art
Neural networks have been known and used in the prior art in computer applications which require complex and/or extensive processing. Such applications include, e.g., pattern recognition and image and voice processing. In these applications, neural networks have been known to provide greatly increased processing power and speed over conventional computer architectures. Several approaches to neural networking exist and can be distinguished from one another by their different architectures. Specifically, the approaches of the prior art can be distinguished by the numbers of layers and the interconnections within and between them, the learning rules applied to each node in the network, and whether or not the architecture is capable of supervised or unsupervised learning.
A neural network is said to be xe2x80x9csupervisedxe2x80x9d if it requires a formal training phase where output values are xe2x80x9cclampedxe2x80x9d to a training set. In other words, such networks require a xe2x80x9cteacher,xe2x80x9d something not necessarily found in nature. Unsupervised networks are desireable precisely for this reason. They are capable of processing data without requiring a preset training set or discrete training phase. Biological neural networks are unsupervised, and any attempt to emulate them should aspire to this capability. Of the following approaches, the Boltzmann/Cauchy and Hidden Markov models are supervised networks and the remainder are unsupervised networks.
At least eight principal types of feedback systems, also called backpropagation models, have been identified in the prior art. The Additive Grossberg model uses one layer with lateral inhibitions. The learning rule is based on a sigmoid curve and updates using a steepest ascent calculation. The Shunting Grossberg is similar, with an added gain control feature to control learning rates. Adaptive Resonance Theory models use two layers, with on-center/off-surround lateral feedback and sigmoid learning curves. The Discrete Autocorrelator model uses a single layer, recurrent lateral feedback, and a step function learning curve. The Continuous Hopfield model uses a single layer, recurrent lateral feedback, and a sigmoid learning curve. Bi-Directional Associative Memory uses two layers, with each element in the first connected to each layer in the second, and a ramp learning curve. Adaptive Bi-Directional Associative Memory uses two layers, each element in the first connected to each in the second, and the Cohen-Grossberg memory function. This also exists in a competitive version. Finally, the Temporal Associative Memory uses two layers, with each element in the first connected to each element in the second, and an exponential step learning function.
At least eight principal types of feedforward systems have been identified. The Learning Matrix uses two layers, with each element in the first connected to each element in the second, and a modified step learning function. Drive-Reinforcement uses two layers, with each element in the first connected to each in the second, and a ramp learning function. The Sparse Distributed Memory model uses three layers, with random connections from the first to the second layer, and a step learning function. Linear Associative Memory models use two layers, with each element in the first layer connected to each element in the second, and a matrix outer product to calculate learning updates. The Optimal Linear Associative Memory model uses a single layer, with each element connected to each of the others, and a matrix pseudo-inverse learning function. Fuzzy Associative Memory uses two layers, with each element in the first connected to each element in the second, and a step learning function. This particular model can only store one pair of correlates at a time. The Learning Vector Quantizer uses two layers, with each element in the first connected to each in the second, negative lateral connections from each element in the second layer with all the others in the second layer, and positive feedback from each second layer element to itself. This model uses a modified step learning curve, which varies as the inverse of time. The Counterpropagation model uses three layers, with each element in the first connected to each in the second, each element in the second connected to each in the third, and negative lateral connections from each element in the second layer to each of the rest, with positive feedback from each element in the second layer to itself. This also uses a learning curve varying inversely with time.
Boltzmann/Cauchy models use random distributions for the learning curve. The use of random distributions to affect learning is advantageous because use of the distributions permits emulation of complex statistical ensembles. Thus, imposing the distributions imposes behavioral characteristics which arise from complex systems the model networks are intended to emulate. However, the Botzmann/Cauchy networks are capable only of supervised learning. And, these models have proven to be undesirably slow in many applications.
Hidden Markov models rely on a hybrid architecture, generally of feedforward elements and a recurrent network sub-component, all in parallel. These typically have three layers, but certain embodiments have had as many as five. A fairly typical example employs three layers, a softmax learning rule (i.e., the Boltzmann distribution) and a gradient descent algorithm. Other examples use a three-layer hybrid architecture and a gamma memory function, rather than the usual mixed Gaussian. The gamma distribution is convenient in Bayesian analysis, also common to neural network research, and is the continuous version of the negative-binomial distribution. However, the underlying process for this model is a stationary one. That is, the probability distribution is the same at time t and time t+xcex94t for all xcex94t.
Studies of language change and studies of visual and acoustic processing in mammals have been used in the prior art to identify the mechanisms of neural processing for purposes of creating neural network architectures. For example, it has been noted that mammalian visual processing seems to be accomplished by feed-forward mechanisms which amplify successes. Such processing has been modeled by calculating Gaussian expectations and by using measures of mutual information in noisy networks. It has further been noted that such models provide self-organizing feature-detectors.
Similarly, it has been noted in the prior art that acoustic processing in mammals, particularly bats, proceeds in parallel columns of neurons, where feed-forward mechanisms and the separation and convergence of the signal produce sophisticated, topically organized feature detectors.
The semantic attractor memory of the invention according to a preferred embodiment uses a neural network architecture and learning rules derived from the study of human language acquisition and change to store, process and retrieve information. The invention provides rapid, unsupervised processing of complex data sets, such as imagery or continuous human speech.
The semantic attractor memory according to a preferred embodiment of the invention is motivated by considerations from human language acquisition and change, as well as the general constraints posed by the structure of the human perceptive apparatus and systems for muscle control. It is based on multiple layer channels, with random connections from one layer to the next; several layers devoted to processing input information; at least one processing layer; several layers devoted to processing outputs; feedback from the outputs back to the processing layer; and inputs from parallel channels, also to the processing layer. With the exception of the feedback loop and central processing layers, the network is feedforward. The learning rules are preferably based on Bose-Einstein statistics, again derived from considerations of human language acquisition.