1. Field of the Invention
The present invention generally relates to artificial intelligence and adaptive systems and, more particularly, to a neural system of generalized applicability and enhanced adaptive capability.
2. Description of the Prior Art
It has long been recognized that a major limitation of computing or data processing systems is the need for detailed programming and data entry. Many problems of great complexity exist which might be susceptible of solution with computers but for possibly incomplete knowledge of the problem or an inherent inability to characterize the problem which precludes sufficiently detailed programming for satisfactory solution of the problem. Therefore there has been a great interest in artificial intelligence and adaptive systems which can analyze conditions or other inputs in order to study the problem and adaptively seek a solution.
As an example of a problem which is inherently difficult to characterize, artificial intelligence could be used to greatly enhance the human-computer interface by allowing the computer to observe the habits of the user and optimize its own performance by operating predictively to manage the retrieval of subroutines which are likely to be needed. This has been done, to a degree, in arrangements for managing cache memories in regard to the operation of known application programs. However, given the variety of habits or ways in which different users may interact with particular application programs (e.g. if the user typically prints a document after completing a particular number of pages thereof, a spooler routine could be predictively fetched), it can be readily understood that a highly adaptive artificial intelligence arrangement would be required.
Accordingly, much effort has been expended in study of the learning process in humans and animals in an effort to simulate such processes by computer. In theory, accurate simulation would provide an artificial intelligence system of greatest generalization (e.g. applicability) and which would have the capability of inference and extrapolation as well as optimally fast learning or adaptation.
At the present state of the art, artificial intelligence systems are of massively parallel architecture. The basic element of such systems is the so-called neural circuit which must minimally have inputs for information, storage for such information, an arrangement to assign weights to respective portions of such information, a means for mapping the combination of inputs to an output based on the assigned weights and some means for altering the weights in response to a comparison of the output produced and the desired output. It is the ability to alter the assigned weight values and correspondingly alter the output of the neuron that allows the neuron to "learn".
The highly parallel architecture of artificial intelligence systems implies that any practical system must include extremely large numbers of neural circuits. Therefore, the complexity of each neural circuit must be kept to a manageable level which will allow great numbers of neural circuits to be combined into an artificial intelligence system. Particularly within this constraint, it has not been possible to provide optimal adaptability of the map and thus provide a neural circuit of generalized applicability.
Consider a digital input of N bits, each bit representing a piece of information or a stimulus. If the neuron is to be fully generalized, some bits of the input may be crucial to determining the output, some may be conditionally important (e.g. in combination with certain states of other bits) and some may be entirely irrelevant to the output. It must be remembered in this regard that the intended function of the neural circuit is to adapt itself to respond to certain combinations of the input bits with certain desired responses and thus develop the ability to map certain input combinations to desired outputs. The neural circuit will not initially know anything about the nature of the response it is to make to any bit or combination of bits of the input and must learn to provide the desired mapping of inputs to an output by developing weights which are to be applied to the values of the inputs. It can readily be appreciated that such a circuit can learn that certain inputs are crucial or irrelevant (developing a one or zero weight, respectively), but the learning of the weights to derive the desired response to a combination of inputs may prove difficult or require an excessive amount of time. For example, it was long believed that an Exclusive-OR transfer function could not be learned. At the present state of the art, circuits of greater complexity have been able to learn the Exclusive-OR function but the learning process is slow in terms of the number of iterations which may be required.
It can be shown that the number of Boolean functions which the neural circuit should ideally be able to emulate for N binary inputs is 2 exp (2.sup.N). Therefore, for only two binary inputs, the number of transfer functions would be 16. For three binary inputs, the number of transfer functions would be 256. Four inputs would require over 64,000 transfer functions, and so on. Even the accommodation of a small number of Boolean functions has proven difficult in the past. Therefore, the development of a realistic neuron (e.g. a neural circuit capable of complete Boolean response), while possible, is very difficult, particularly in view of the difficulty of producing learning of the Exclusive-OR function.
It is typical in neurons to provide a storage means for weights which are to be applied to the inputs. This storage is typically provided by a counter which is incremented or decremented until the desired function is unconditionally achieved, if it can be achieved at all. The criterion for incrementing or decrementing a counter for learning in this manner is referred to as the Hebbian Rule which generally provides, for example, that the weight of an "on" input is incremented and the weight of an "off" input is decremented when an actual output of the neuron is produced which is less than the desired output and vice-versa. Once the desired actual output is reached, it is assumed that learning is complete. However, since learning is usually implemented by performing small incremental weight changes requiring multiple passes through the data set, such a function does not provide true adaptability where learning can continue as conditions of desired output change. Since learning is accomplished by processing (such as by back-propagation) through all stored data, each learning event necessarily takes longer than the last until the learning performance of the circuit becomes unacceptably slow, particularly for neural circuits where N is large.
In contrast to so-called learning architectures such as the above-noted back-propagation architecture, rapid storage is possible in so-called memory based neural circuits such as the Hopfield circuit, schematically illustrated in FIG. 1a. As will be described in greater detail below, these types of circuits allow many input vectors to be readily memorized and principally recognize differences, if any, between a given input vector and a vector previously input and stored.
Ideally, in either type of neural circuit, all input data which has contributed to learning should be retained in memory and learning should be continuous but nevertheless provide a rapid response to alteration of desired output for any given input vector or a desirably rapid learning curve. However, it can be readily understood that such a provision would require either extremely large memory or extended processing time or both as the number of stored items is increased. For instance, with known back-propagation processes, it is necessary to add new data to a file of historical data and to then process the entire file to determine the new response of the neural circuit. This, itself, can be a severe limitation on speed of learning response.
The memory based architectures attempt to hold extremely large address spaces by using statistical techniques. However, they are then incapable of perfect memory of learning events. Therefore, it can be seen that the attempt to provide a simulation of an ideal or biological neuron has appeared to be replete with insoluble difficulties or problems which could not be approached due to inherent practical hardware limitations.
Due to such inherent practical hardware limitations, it has been the practice in the art to resort to statistical representations of the data acquired during the learning process. However, once this has been done to bring memory requirements within practical limits, there is no way to thereafter determine which vectors, presented to the system, produce results by generalization from the statistical representation of the data or from actual previous input vectors, inherently creating an inability to determine confidence levels in the sense of determining whether exact precedent exists for any particular decision.
Additionally, it should be noted that the back-propagation type of circuit is of static configuration and can only be optimized for economy of hardware and learning performance with foreknowledge of the nature of the learning it is to do. Therefore, this approach, at the present state of the art, appears to be particularly ill-suited to being generalized sufficiently to accommodate learning where nothing is known of the problem beforehand. On the other hand, while the Hopfield type of neural circuit may be theoretically capable of rapid response, its generalization requires impractical amounts of memory due to inefficient and storage as will be discussed in greater detail below. If statistical representations are used to reduce memory requirements, the Hopfield type of neural circuit would be inherently incapable of being able to distinguish actual input vectors which have been memorized from the statistical representations thereof.
In summary, approaches to the design of neuron-simulating architectures has led toward either memory-based architectures or learning architectures. Memory-based architectures, such as the so-called "Hopfield network", shown in a simplified form in FIG. 1a, has a theoretical maximum efficiency of storing 2N vectors for N inputs. However, this theoretical efficiency can be approached only with particular input vectors and connections and in practice, maximum efficiency is only about 15% (e.g. requiring 100 neural circuits to represent 15 vectors of 100 bits, each). Such memory-based architectures provide for a content addressable memory and generally provide a very fast storage response to new input vectors, but are not efficient in use of hardware. Learning architectures, on the other hand, as shown in simplified form in FIG. 2a, use incremental adjustments in weights to modify an otherwise fixed transfer function. Learning architectures provide a more effective and statistically reliable basis for inferential decisions and require less memory but are inherently slow because of the incremental nature of the development of weights by which the transfer function is modified.