This invention relates to artificial neuron-like devices (hereinafter referred to simply as "neurons") for use in neural processing.
One of the known ways of realising a neuron in practice is to use a random access memory (RAM). The use of RAMs for this purpose dates back a considerable number of years. Recently, a particular form of RAM has been described (see Proceedings of the First IEE International Conference on Artificial Neural Networks, IEE, 1989, No. 313, pp 242-246) which appears to have the potential for constructing neural networks which mimic more closely than hitherto the behaviour of physiological networks. This form of RAM is referred to as a pRAM (probabilistic random access memory). For a detailed discussion of the pRAM attention is directed to the paper identified above. However, a brief discussion of the pRAM is set out below, by way of introduction to the invention.
The pRAM is a hardware device with intrinsically neuron-like behaviour (FIG. 1). It maps binary inputs [5] (representing the presence or absence of a pulse on each of N input lines) to a binary output [4] (a 1 being equivalent to a firing event, a 0 to inactivity). This mapping from {0,1}.sup.N to {0,1} is in general a stochastic function. If the 2.sup.N address locations [3] in an N-input pRAM A are indexed by an N-bit binary address vector u, using an address decoder [6], the output a .epsilon. {0,1} of A is 1 with probability ##EQU1## where i .epsilon. {0,1}.sup.N is the vector representing input activity (and x is defined to be 1-x for any x). The quantity .alpha..sub.u represents a probability. In the hardware realisation of the device .alpha..sub.u is represented as an M-bit integer in the memory locations [3], having a value in the range 0 to 2.sup.M -1 and these values represent probabilities in the range ##EQU2## The .alpha..sub.u may be assigned values which have a neuro-biological interpretation: it is this feature which allows networks of pRAMs, with suitably chosen memory contents, to closely mimic the behaviour of living neural systems. In a pRAM, all 2.sup.N memory components are independent random variables. Thus, in addition to possessing a maximal degree of non-linearity in its response function--a deterministic (.alpha..epsilon.{0,1}.sup.N) pRAM can realise any of the 2.sup.2.spsp.N possible binary functions of its inputs--pRAMs differ from units more conventionally used in neural network applications in that noise is introduced at the synaptic rather than the threshold level; it is well known that synaptic noise is the dominant source of stochastic behaviour in biological neurons. This noise, .nu., is introduced by the noise generator [1]. .nu. is an M-bit integer which varies over time and is generated by a random number generator. The comparator [2] compares the value stored at the memory location being addressed and .nu.. One way of doing this is to add the value stored at the addressed location to .nu.. If there is a carry bit in the sum, i.e. the sum has M+1 bits, a spike representing a 1 is generated on arrival of the clock pulse [7]. If there is no carry bit no such spike is generated and this represents a 0. It can be seen that the probability of a 1 being generated is equal to the probability represented by the number stored at the addressed location, and it is for this reason that the latter is referred to as a probability. It should be noted that the same result could be achieved in other ways, for example by generating a 1 if the value of the probability was greater than .nu.. It can also be noted that because pRAM networks operate in terms of `spike trains` (streams of binary digits produced by the addressing of successive memory locations) information about the timing of firing events is retained; this potentially allows phenomena such as the observed phase-locking of visual neurons to be reproduced by pRAM nets, with the possibility of using such nets as part of an effective `vision machine`.
For information concerning in particular the mathematics of the pRAM attention is directed to the paper written by the present inventors in the Proceedings of the First IEE International Conference in Artificial Neural Networks, IEE, 1989, No. 313, pp. 242-246, the contents of which are incorporated herein by reference.
FIG. 9 shows a simple neural network comprising two pRAMs denoted as RAM 1 and RAM 2. It will be understood that for practical applications much more extensive networks are required, the nature of which depends on the application concerned. Nevertheless, the network shown in FIG. 9 illustrates the basic principles. It will be seen that each pRAM has an output OUT and a pair of inputs denoted IN1 and IN2. Each output corresponds to the output [4] shown in FIG. 1. The output from RAM 1 is applied as an input IN1 of RAM 1, and the output from RAM 2 is applied as an input to the input IN2 of RAM 1. The output from RAM 1 is also applied as an input to the input IN2 of RAM 2, and the output of RAM 2 is applied as an output to the input IN1 of RAM 2. The network operates in response to clock signals received from the circuit labelled TIMING & CONTROL.
The circuitry of RAM 1 is shown in detail in FIGS. 10A-10D. RAM 2 is identical, except that for each reference in FIGS. 10A-10D to RAM 1 there should be substituted a reference to RAM 2 and vice versa.
RAM 1 comprises a random number generator. This is of conventional construction and will therefore not be described here in detail. The embodiment shown here employs shift registers and 127 stages are used to give a sequence length of 2.sup.127 -1. It will be noted that the random number generator has an array of three EXOR gates having inputs 2, 3 and 4 which can be connected to selected ones of the taps T of the shift registers. The taps selected in RAM 1 will be different to those selected in RAM 2 and appropriate selection, according to criteria well known to those in the art, avoids undesired correlation between the random numbers generated by the two generators. The output of the random number generator is an 8-bit random number which is fed as two 4-bit segments to two adders which make up a comparator.
The illustrated embodiment has a memory which holds four 8-bit numbers held at four addresses. The memory is thus addressed by 2-bit addresses. At each operation of the network the contents of the addressed storage location in the memory are fed to the comparator where they are added to the random number generated at that time. The output of the comparator is a `1` is the addition results in a carry bit and is a `0` otherwise.
The output of the comparator is fed to the output of the RAM (which is labelled OUT in FIG. 9) and also to a latch. Here it is held ready to form one bit of the next address to be supplied to the address decoder via which the memory is addressed. As can be seen by taking FIGS. 9 and 10A-10D together, the other bit of the address (i.e. that supplied to input IN2 of RAM 1) is the output of RAM 2.
FIGS. 10A-10D also show inputs labelled R1.sub.-- LOAD and MEMORY DATA (FIG. 10D) which enable the system to be initialised by loading data into the memory at the outset, and an input SCLK by means of which clock pulses are supplied to RAM 1 from a clock generator (see below). Finally as regards FIGS. 10A-10D, there is an input denoted GENERATE which is connected to the latch via an inverter gate which serves to initiate the production of a new output from the pRAM and allows a set of 8 SCLK pulses to occur. The clock generator shown in FIG. 11 is of conventional construction and will therefore not be described in detail, its construction and operation being self-evident to a man skilled in the art from the Figure. This provides a burst of 8 clock signals at its output SCLK which is supplied to the timing input of each of RAM 1 and RAM 2. Each time a GENERATE pulse occurs, each of RAM 1 and RAM 2 generates a new 8-bit random number (one bit for each SCLK pulse), addresses a given one of the four storage locations in its memory, compares the random number with the contents of the addressed location with the random number, and generates an output accordingly.
The pRAM thus far described has no learning or training rule associated with it. The provision of a particularly advantageous form of training is claimed in our copending application filed on even date herewith under the title "Neural processing devices with learning capability." This will now be discussed.
Reinforcement training is a strategy used in problems of adaptive control in which individual behavioural units (here to be identified with pRAMs) only receive information about the quality of the performance of the system as a whole, and have to discover for themselves how to change their behaviour so as to improve this. Because it relies only on a global success/failure signal, reinforcement training is likely to be the method of choice for `on-line` neural network applications.
A form of reinforcement training for pRAMs has been devised which is fast and efficient (and which is capable, in an embodiment thereof, of being realised entirely with pRAM technology). This training algorithm may be implemented using digital or analogue hardware thus making possible the manufacture of self-contained `learning pRAMs`. Networks of such units are likely to find wide application, for example in the control of autonomous robots. Control need not be centralised; small nets of learning pRAMs could for example be located in the individual joints of a robot limb. Such a control arrangement would in many ways be akin to the semi-autonomous neural ganglia found in insects.
According to the invention of our copending application there is provided a device for use in a neural processing network, comprising a memory having a plurality of storage locations at each of which a number representing a probability is stored; means for selectively addressing each of the storage locations to cause the contents of the location to be read to an input of a comparator; a noise generator for inputting to the comparator a random number representing noise; means for causing to appear at an output of the comparator an output signal having a first or second value depending on the values of the numbers received from the addressed storage location and the noise generator, the probability of the output signal having a given one of the first and second values being determined by the number at the addressed location; means for receiving from the environment signals representing success or failure of the network; means for changing the value of the number stored at the addressed location if a success signal is received in such a way as to increase the probability of the successful action; and means for changing the value of the number stored at the addressed location if a failure signal is received in such a way as to decrease the probability of the unsuccessful action. The number stored at the addressed location may be changed by an appropriate increment or decrement operation, for example.
A preferred form of training rule is described by the equation EQU .DELTA..alpha..sub.u (t)=.rho.((a-.alpha..sub.u)r+.lambda.(a-.alpha..sub.u)p)(t)..delta.(u-i(t) )(2)
where r(t), p(t) are global success, failure signals .epsilon. {0,1} received from the environment at time t, the environmental response might itself be produced by a pRAM, though it might be produced by many other things. a(t) is the unit's binary output, and .rho., .lambda. are constants. The delta function is included to make it clear that only the location which is actually addressed at time t is available to be modified, the contents of the other locations being unconnected with the behaviour that led to reward or punishment at time t. When r=1 (success) the probability .alpha..sub.u changes so as to increase the chance of emitting the same value from that location in the future, whilst if p=1 (failure) the probability of emitting the other value when addressed increases. The constant .lambda. represents the ratio of punishment to reward; a non-zero value for .lambda. ensures that training converges to an appropriate set of memory contents and that the system does not get trapped in false minima. Note that reward and penalty take effect independently; this allows the possibility of `neutral` actions which are neither punished or rewarded, but may correspond to a useful exploration of the environment.