Broadly speaking, an associative memory system is one in which stimulus/response pairs of information are stored in such a way that the introduction of a stimulus pattern results in the recall of a memory associated response. Memory systems of this type have a very broad range of potential applications including, for example, logical operations management, pattern recognition, and image interpolation.
Traditional associative processes, such as those that are often used in artificial intelligence applications, are dependent on explicitly predefined rule sets that are externally impressed on an associative memory. Expert systems are examples of such traditional architectures. Such expert systems are rules-based paradigms that are managed by an inferential engine. These follow an orthodox von Neumann approach by providing a deterministic software/hardware relationship that follows a series of pre-declared relationships and sequential instructions formatted as predetermined sets of IF--THEN statements. They are inherently limited to those associations that are expressly pre-ordained or are expressly permitted to be logically deduced by preestablished inferential rules. There is no intrinsic adaptive capability in these processes. In consequence there is no dynamic responsiveness to changing environments or, more generally, any ability to develop a set of input-appropriate responses in the absence of an impressed set of applicable rules specifically intended to deal with a changed or changing or otherwise unknown environment. Moreover, as with any purely heuristic programming, the more complex the application, the greater the number of rules that are required, and the proportionately longer the processing time required to deal with those rules. There is a general acceptance that these short comings limit the practical usefulness of pre-defined-rules-based approaches to associative memory systems.
Neural networks, on the other hand, generate their own rules of association through a learning process that draws on the networks exposure to either supervised or unsupervised input data samples drawn from a statistical universe. These systems have, to various degrees, some ability to make generalizations about that universe as a whole, based on the input sampling.
Neural networks are associative memory systems comprising strategic organizations, (architectures), of processing elements. Individually, these elements are each analogous to an individual neuron in a biological system. Individual processing elements have a plurality of inputs, which are functionally analogous to the dendritic processes of a neuron cell. As such, these elements are conditioned in accordance with a paradigm over the course of an ongoing learning process, to dynamically assign and assert a ceratin "weight", based on the current state of the systems knowledge, to the respective inputs. The associative "weights" form the data that is stored in the associative memory of the system. Digital computer implementations of neural networks typically employ numerical methodologies to realize the desired associative recall of stimulus-appropriate responses through weighted summation of the inputs in a digital computing environment. These virtual networks take advantage of the current commercial availability of von Neumann machines, which while inherently deterministic, are nevertheless capable of being used to advantages attached to stochastic architectures in neural network hardware implementations.
An early forerunner to modern neural networks, howsoever they may now be implemented, was an actual hardware device that came to be known as the Perceptron. This was a pattern classification system that could identify both abstract and geometric patterns. A grid of photocells where arranged to receive a primary optical stimulus. These photocells where in turn randomly connected to a plurality of associator elements which perform the functions associated with the front end of what is now recognized as the inputs (or notional dendritic processes) of a neural network processing element. When the cumulative electrical inputs from the cells to the associators units exceeded a certain threshold, the associator units triggered response units to produce an output signal.
The Perceptron, regardless of the form of its hardware implementation, proved to have serious inherent limitations. These are concerned with the systems practical inability to learn certain known functions, in particular the logical "XOR" function of Boolean algebra. In order to be able to learn this type of parity function, the Perceptron paradigm would require either an architecture of multiple interconnected layers of weighted processing elements, or alternatively a system having 2 to the N hidden processing elements. The Perceptron could not properly adjust more than one layer of modifiable weights, thus precluding the first alternative. The alternative of using 2 to the N hidden processing units presents three fundamental problems: There must be 2 to the N processing units in the system for all possible functions which the system might ever have to learn, (which amounts to system design by crystal ball gazing); The number of processing elements required in any such system increases exponentially with the number of required inputs to solve the functions which can be prescribed, and quickly runs into the billions; There is empirical evidence that with large numbers of hidden processing elements, the system loses the ability to formulate reliable generalizations. With these inherent limitations, it was clear that such networks could not emulate or even approximate the functions or efficiencies of the human brain.
The advent of back propagation paradigms for establishing a weighted associative memory for evaluating new stimuli as it is presented to the inputs of the processing elements, represented a major step towards overcoming some of the problems associated with the Perceptron paradigm. For example, back propagation incorporates an error handling mechanism that overcomes at least some of the "linear separability classification" limitations associated with the Perceptron. Back propagation establishes a processing assumption that all processing elements in any given layer of a network architecture introduce errors in the assignment of a response that issues from that layer, to any stimulus received from the preceding layer. The responsibility for that error is then quantified and distributed throughout the weightings of each of the processing element inputs in the previous layer, down to and including the inputs to the network. This learning process is inherently slow, in that several iterations of the back propagation are required before the desired convergence of error terms (ie "dilution" of information error) is achieved.
Current state-of-the-art neural networks can be, in general, all be classified as gradient descent models, in which the network data is stored as "weights" in the manner described above.
In operation these networks work by having weighted, scaler input values summed by the processing element, then normalized in order to maintain some degree of stability in the distribution of generated output response values. Typically, normalization involves a thresholding or scaling of the summation product. Variations on the sigmoid function are usually used for this purpose.
A number of examples of these subsequent developments in neural network technology have pursued models predicated on natural biological systems. One of the better known was the development of so-called "Hopfield Nets" in the early 1980's. Hopfield's model was amongst the first to clearly represent neuron operation as a specific thresholding operation and illustrated memory as information stored in the interconnections between processing elements, which where cast as a minimum energy function.
One example of a gradient descent network is the matrix algebra based associative memory model that is described in "Neural Networks and Physical Systems with Emergent Collective Computational Abilities", J. J. Hopfield, Proc Narl. Academy of Science, USA, 1982, Vol. 79, pp 2554-2558. This model utilizes feedback and non-linear thresholding to force the output pattern to be the stored pattern which most closely matches the input pattern. A major drawback of this model is the large storage and computational effort that is inherently required for the manipulation of an association matrix memory that is used in the model. In essence this represented a special case of the more general features of the Cohen-Grossberg networks, in which the processing elements took on any real activation value resulting from a sigmoid output threshold function alternating between the minimum and maximum values to define that activation value of the processing element. The response to any external stimulus to one of these networks was shown to converge to an equilibrium based on an energy of Lyapunov function.
With the ongoing advancement of neural network technology, networks have been further enhanced by various multi-layer architectures. Weighting of processing element inputs through normalization and competition have continued to improve some of the drawbacks that nevertheless continue to be associated with neural networks.
By way of example, and in addition to all the other short-comings set out above, all of these networks continue to suffer from an inherent form of input information truncation, that is in part a legacy of von Neumann architectures. As with any system, information loss results in an increase in error rates, and error remediation in turn requires that compensatory processing strategies be adopted. That approach in its own turn results in increased processing (both learning and response) time by depending on large numbers of computational and sampling iterations (in the hope of "diluting" out the errors by increasing the sample size), with correspondingly increased memory storage space requirements. Moreover such remediation can at best only diminish the error that is intrinsically introduced by a gradient response regimen. It cannot eradicate it. Accordingly, while normalization in gradient descent networks is essential, it also results in collateral degradation of the informational value of input data. Note too, that the making of generalizations based on the erroneous precepts that can follow from information loss, limits the reliable application of such networks to linearly non-separable stimuli.
A good example of this kind of remediation problem is associated with a connectionist neural network architecture sometimes referred to as a Boltzman machine, that utilizes a back propagation paradigm. This type of machine is intended to deal with what one author has labelled "computational gangrene". This problem is implicit in any deterministic approach that is taken to problem solving, in that a mistaken decision in a deterministic path may foreclose on any possibility of downstream remediation, thereby forever cutting-off the correct interpretative pathway that leads to the problems correct, or at least optimal solution. While neural networks in general, go some distance to ameliorating this problem, it continues to exist.
Boltzman machines are equilibrium-seeking connectionist machines in which processing elements exhibit binary (on-off) behaviour in response to input stimuli. The response of such a processing element in any given circumstance is determined by both the weighted signals passed along by neighbouring processing elements, and also by a probabilistic signal, thus rendering the response stochastic. The behaviour of such a machine can be described in terms of Boltzman's thermodynamic equations, which allow that even though the response states of individual processing units cannot be predicted, the overall equilibrium response of the network is resolvable. In the meantime, the internal "randomness" of individual processing elements response states that contributes to the Boltzman machines overall "directed of non-random" response can help the network to avoid getting stuck in "locally attractive" but "globally sub-optimal" solutions, and thereby side steps some of the risk of computational gangrene that arises in strictly deterministic von Neumann machines. It has been observed, however, that while Boltzman machines have a greater probability of reaching better solutions than are possible with von Neuman architectures, the existence of "noise" in real life problems poses a problem. Theoretically, a Boltzman machine will of necessity arrive at the optimal solution to any of a limited number of specific classification problems, provided that it is given an unlimited amount of time for that purpose. The exigency of real time problems rarely permits protracted problem solving exercises of any significant duration, however, and the inability of Boltzman machines to dependably resolve problems within a reasonable time, inherently limits their usefulness.
Accordingly, there remains a need in the art for alternatives to current neural network systems.