The invention pertains to the field of circuits for emulating high order functions performed in brain cortex. More particularly, the invention pertains to circuits which can emulate the learning and recognition functions of the brain.
There has been a long history of efforts by workers in the art to describe and in a few cases develop electronic circuits which emulate higher order brain functions such as memory or learning and perception/recognition. Circuits which have been developed in the past of which the applicants are aware can generally be broken down into several major categories.
All these prior art devices sense an input event and output a pattern of output signals which identifies the event more or less accurately. The first category is perceptrons. Perceptrons generally involve two stages of circuitry. The first stage receives the physical stimulus and generates outputs which characterize the event sensed. The second stage recognizes and identifies the event by generating a pattern of output signals that identifies the event, if the event has been previously learned, by sampling the output signals from the first stage and processing these samples.
Events are learned by strengthening and weakening couplings. That is the perceptron class of prior art devices learns by changing the connections between the first stage and second stage circuitry. These changes are made in accordance with the perceptron learning rule. This rule is as follows. When an event is sensed, the decision cells form a pattern of outputs. If this pattern is the same as a pattern chosen by the user to represent the input event, then nothing happens. If a cell in the output array fires when it is not supposed to fire for the user selected output pattern, i.e., a false positive indication, then the coupling for that cell are weakened to make it less likely to fire the next time the same event is sensed. If a particular cell does not fire when the input event is present and the cell is supposed to fire to make up the output pattern chosen by the user to represent the event, then the connections for that cell are strengthened. If a particular cell fires only when it is supposed to fire for a given event and does not fire when it is not supposed to fire when the event is not present, then that cell's connections are not changed.
Perceptrons learn by convergence over several presentations of the same event. The learning process is characterized by the changing of cell connections to strengthen or weaken them until the output pattern converges with the desired output pattern selected by the user. This convergence process is repeated for several different input events such that the perceptron eventually becomes able to recognize any of a number of events.
In a 1969 monograph by Minski, M., and Papert, S., 1969, Perceptrons, Cambridge, Mass.: MIT press, an analysis of perceptrons was carried out. It was shown in this paper that perceptrons have limited powers of recognition, and are unable to carry out logic operations needed for higher perceptual functions. It was proposed that intermediate layers of cells between the detection and decision stages would be needed. However, this paper argued that adequate learning rules would be exceptionally difficult to design for such an arrangement. Partially because of the ideas presented in this 1969 paper perceptrons fell into disuse. However, certain modified forms of perceptrons are now returning to the attention of workers skilled in the art.
A second major class of prior art devices is association matrices. An example of these types of devices is described in a 1981 paper by Willshaw, D., "Holography, Associative Memory, and Inductive Generalization", in Hinton, G. E., and Anderson, Parallel Models of Associative Memory, Hillsdale, N.J., Lawrence Erlbaum, 1981, pp. 83-109. Association matrices generally take one basic form. A simple matrix comprised of horizontal lines crossing and contacting an equally sized set of vertical lines is constructed. Each vertical line terminates in a voltage summing device. The horizontal lines are used as inputs and simulate the functions of axons in the cortex structure of the brain. The vertical lines simulate the function of dendrites extending from neurons, and the voltage summing device acts to simulate the function of the neuron cell body.
A second set of inputs to the vertical lines is provided. Each of the inputs in the second set of inputs contacts one vertical line or its voltage summing device. This second set of inputs will be referred to hereafter as the vertical inputs and will be numbered F.sub.1 -F.sub.n. The horizontal lines will be referred to as the horizontal inputs and will be labeled H.sub.1 -H.sub.n. The vertical lines of the matrix which emulate the function cf dendrites will be labeled J.sub.1 -J.sub.n. The matrix is defined in two dimensions by the connections between the H inputs and J dendrites. The summing elements may be either linear or nonlinear. If linear, they produce an output which has a voltage or current amplitude which is proportional to the input amplitude on the dendrites. If the summing elements are nonlinear, they produce a binary output which is active only when the amplitude of the signal on the dendrites exceeds a certain threshold. The association matrices taught by Kohonen, T., and Lehtio P., "Storage and Processing of Information in Distributed Associative Memory Systems", in Hinton and Anderson, supra, pp. 105-143, use linear summing elements, whereas most other workers in the association matrix art use nonlinear summing elements.
The association matrix learns by increasing the strength of contacts to selected target lines which are selected by the user. For example, the user will activate a pattern of the vertical inputs F selected by him or her to represent a particular event. Assume that the user selected pattern is F elements 1, 3, 7 and 9. These activated F inputs will produce outputs on a one to one basis on their corresponding J target elements 1, 3, 7 and 9. To learn a particular event, the event sensed causes a particular subset of the horizontal input or H elements to be activated in response to certain characteristics of the event. Since association matrices generally have a contact between every horizontal input and every target element, the event-activated horizontal inputs H which contact the target elements J which have been activated (J elements 1, 3, 7 and 9) by the user activated vertical inputs F elements 1, 3, 7 and 9 have their contacts strengthened. This process is repeated for a number of events with the user selecting a different pattern to vertical inputs or F elements to activate for each pattern to be learned.
To use such an association matrix which has learned a collection of input patterns for association, the F inputs are deactivated and an event is sensed and the horizontal inputs H which characterize the event are activated by the characteristics of the event. This pattern of active horizontal inputs will, if the strengthening process is performed properly, cause the output pattern which was selected by the user to represent the particular event sensed will again appear. This pattern is then compared to the known patterns for events which have been learned and the particular event which sensed may be identified.
A problem with associative matrices is cross-talk. The particular patterns of horizontal inputs which characterize each event may be orthogonal in the sense that the particular inputs which represent each event do not overlap. Conversely, the horizontal inputs may be non orthogonal in the sense that there is some overlap between active horizontal inputs characterizing two or more events. This overlap cannot be minimized by the user since the horizontal inputs which are activated for each event depend only upon the characteristics of the event and not upon the desires of the user. Where there is overlap, then cross-talk results in that one event's horizontal input pattern will cause recall of some or much of the output pattern of another event's output pattern.
Such association matrices may also be used as autocorrelational matrices. In such a structure, there are no horizontal inputs H only vertical inputs F and target elements J, but the outputs of the target elements J are fed back through the dendrites or target elements as if they were horizontal inputs H to form a matrix. During the learning process, assume that F elements 1, 3, 5 and 7 are activated. This causes J elements 1, 3, 5 and 7 to be activated, and the outputs from these J elements are routed back through the matrix. Since J element number 1's output line contacts the dendrite or target lines of J elements 3, 5 and 7, these contacts are strengthened. The same is true for the other J outputs. The result is that the association matrix so trained will cause the entire output pattern 1, 3, 5 and 7 to appear by virtue of the input even if some of the F inputs are missing from an input pattern.
Another major class of prior art devices in this field is relaxation models. In both the perceptron and association matrix prior art, the output vector or pattern of output signals representing a particular event is assigned by the user. Usually the output vectors are chosen to be identical with or similar to the object to be recognized. More recent work on so called "connectionist" models allows the network itself to select the output pattern to represent the event sensed. This is accomplished by placing constraint rules in the matrix such that when the matrix is allowed to "free run", i.e., cycle in state space from an initial state which is somewhat like an event which has been learned, then the matrix will gravitate toward or converge toward a state represented by an output vector which represents an event which has been learned. Thus the matrix acts as a content addressable memory where the memory of a learned event is reached by supplying the matrix with input information which is a subset of the input information which originally defined the event which was learned. Any subset of adequate size will suffice. Such a matrix is said to be addressable by content rather than address. In essence, a randomly organized network of connections will cycle until a pattern emerges, and that pattern will remain stable until the network is perturbed by additional input.
An advanced perceptron has been proposed by Sejnowski, T. J., et al, "Learning Symmetry Groups with Hidden Units: Beyond the Perceptron", in Reports of the Cognitive Neuropsychology Laboratory (No. 10), Baltimore, Md.: The Johns Hopkins University, using the relaxation model concepts. Their simulation uses input elements an intermediate layer of elements and a final array of output or decision elements. The traditional perceptron learning rule is used but in a probablistic fashion in the sense that synapses change when the pre and post-synaptic elements are simultaneously active according to a sigmoid probability function. Moreover, all contacts or synapses are symmetrical so that the inputs and intermediate elements are reciprocally connected and the same is true for the intermediate and final decision elements. The strength of contacts, i.e., changing of the properties of the synapses, can become less in terms of the effects the synapses cause in response to input stimuli. In some variants of the model, the elements within a layer are mutually inhibitory. Also, the decision elements are assigned values rather than being simple, binary neuron-type elements.
Hopfield, J. J., "Neural Networks and Physical Systems with Emergent Collective Poperties", Proc Natl, Acad Sci (U.S.A.) (1982) 79:2554-2558 uses a single layer of randomly interconnected elements with one input line for every target line. His design thus resembles an association matrix with autocorrelational functions. Contacts can be inhibitory or additive in that each contact, when active, may either add or subtract to the response on a particular target element. The system is largely symmetrical.
When an input is presented to a Hopfield matrix, the system cycles through a series of synaptic or contact changes until the entire network settles into a pattern. The patterns that emerge transiently across successive cycles can be thought of as successive energy states. Because of the symmetry and the possibility of sign reversal, the entire system behaves like a multibody problem in physics in seeking a local energy minimum. A particular minimum will be found depending upon the pattern of inputs that is applied to the system.
The foregoing association machines present several problems. First in machines where the contacts may change either negatively or positively, when more than one event is learned, there is a possibility that later learned events may obscure the record of earlier learned events. Further, there is generally a high percentage of contacts relative to the number of input lines and target lines. This limits the number of output vectors which the system may provide for a given number of contacts. In systems where the operator selects the output vector which is to represent a particular event and the number of contacts is high relative to the number of input/target line intersections, cross-talk can occur. This creates a need for orthogonality of input vectors for good recognition properties. Further, the systems discussed above are not capable of reliable, consistent association of input stimuli with the correct output vector for cluttered or incomplete input vectors. The cluttered input vector situation arises when the input stimuli are cluttered by noise. The incomplete input vector situation arises where the input stimuli for a particular event are not all present i.e., only part of the input vector characterizing a particular event which has been learned is active. Thus a need has arisen for a learning and association machine which can overcome some of the above noted deficiencies and provide new abilities not heretofore present in the art.