This invention relates to artificial intelligence systems, methods and computer program products, and more particularly to artificial neuron systems, methods and computer program products.
Associative memories, also referred to as content addressable memories, are widely used in the field of pattern matching and identification, expert systems and artificial intelligence. A widely used associative memory is the Hopfield artificial neural network. Hopfield artificial neural networks are described, for example, in U.S. Pat. No. 4,660,166 to Hopfield entitled xe2x80x9cElectronic Network for Collective Decision Based on Large Number of Connections Between Signalsxe2x80x9d.
Although associative memories may avoid problems in prior back-propagation networks, associative memories may present problems of scaling and spurious memories. Recent improvements in associative memories have attempted to solve these and other problems. For example, U.S. Pat. No. 6,052,679 to coinventor Aparacio, IV et al., entitled xe2x80x9cArtificial Neural Networks Including Boolean-Complete Compartmentsxe2x80x9d provides a plurality of artificial neurons and a plurality of Boolean-complete compartments, a respective one of which couples a respective pair of artificial neurons. By providing Boolean-complete compartments, spurious complement memories can be avoided.
Associative memories also have been marketed commercially. For example, a product known as MemoryAgent marketed by International Business Machines Corporation (IBM) provides a low level set of application programming interfaces that can be used for building embedded learning agents, characterized by the term xe2x80x9cSmart Assistancexe2x80x9d. See the publication entitled xe2x80x9cReport: IBM""s Memory Agentxe2x80x9d, Intelligence In Industry, Vol. 8, No. 1, January 1999, pp. 5-9. Other vendors, including Haley Enterprises and Intellix A/S also offer associative memory tools. In particular, Haley Enterprises supports a commercial associative memory called xe2x80x9cThe Intelligent Memoryxe2x80x9d. See http://www.haley.com/TIM.html. Intellix A/S supports another commercial associative memory called xe2x80x9cKnowmanxe2x80x9d using a software framework called SOUL (Self-Optimizing Universal Learner). See http://www.intellix.com. Some vendors offer self-organizing feature maps, as described in U.S. Pat. No. 5,870,729 to Yoda entitled Self-Organizing Neural Network for Pattern Classification; and U.S. Pat. No. 5,943,670 to Prager entitled System and Method for Categorizing Objects in Combined Categories, which also are a form of associative memory. Associative memories also have been applied to electronic commerce, as shown in U.S. Pat. No. 5,619,709 to Caid et al. entitled System and Method of Context Vector Generation and Retrieval. Other applications of associative memories include handwriting recognition in hand-held devices, such as the Palm Pilot, marketed by 3Com.
Although associative memories only recently have been marketed commercially, they are expected to rapidly grow for applications that desire personalization and knowledge management. In fact, one expert has predicted that xe2x80x9cBuilding autoassociative memories will be a very large businessxe2x80x94some day more silicon will be consumed building such devices than for any other purpose.xe2x80x9d See Technology Review, Vol. 102, No. 4, July/August 1999, p. 79.
Unfortunately, there is a fundamental scaling problem that can limit the use of associative memories to solve real world problems. In particular, many associative memories use linear weights. As shown in FIG. 1A, each input can be associated once with each output according to a weight WA-WE. However, the inputs in such linear networks generally do not associate with each other. This can severely limit the ability of such networks to learn and represent possible nonlinearities, such as interactions between the inputs, that may be found in co-requirements or trade-offs between inputs.
An alternative to the linear network of FIG. 1A is the geometric Hopfield network of FIG. 1B. In the Hopfield network, one-to-one connections are provided between all nodes, and a weight is provided for each arch between the nodes. As shown in FIG. 1B, it may be difficult to scale Hopfield networks for real-world applications due to the explosion of weights that is provided between all inputs. Since nonlinear networks generally intercept all inputs with each other, an N2 or geometric scaling function is produced. More specifically, the number of connections between inputs generally is equal to Nxc2x7(Nxe2x88x921)/2, where N is the number of inputs.
This geometric scaling generally is unreasonable to support applications at the scale of complexity that warrants such technology. For example, for general purpose search and personal modeling, tens of thousands of input variables and millions of models may need to be managed. At the other extreme, machine learning in operating systems may need to be more efficient as client machines become smaller, wireless devices. In such situations, only one user""s model may be needed, but the number of contexts and input variables may still be very large. Even at the level of a household with a few individuals, the number of inputs may be on the order of hundreds of thousands. It therefore may be unreasonable to use present techniques in such applications, even in the larger physical memory capacities that are expected in the next few years. Thus, applications of agent-based learning for such environments are now emerging, but the learning technology to support these applications may be difficult to implement due to the scaling problems of learning and using nonlinear associations.
The present invention can provide an artificial neuron that includes a plurality of inputs and a plurality of dendrites, a respective one of which is associated with a respective one of the plurality of inputs. Each dendrite comprises a power series of weights, and each weight in a power series includes an associated count for the associated power. It will be understood that a weight generally is a place-holder for a count, and need not be a separate physical entity. The power series of weights preferably is a base-two power series of weights, each weight in the base-two power series including an associated count that represents a bit position. It has been found, according to the present invention, that, in part, by representing the weights as a power series, the geometric scaling as a function of input in conventional artificial neurons can be reduced to a linear scaling as a function of input. Large numbers of inputs may be handled using real world systems, to thereby solve real-world applications.
The counts for the associated power preferably are statistical counts. More particularly, the dendrites preferably are sequentially ordered, and the power series of weights preferably comprises a pair of first and second power series of weights. Each weight in the first power series includes a first count that is a function of associations of prior dendrites, and each weight of the second power series includes a second count that is a function of associations of next dendrites. More preferably, a first and second power series of weights is provided for each of multiple observation phases.
In order to propagate an input signal into the artificial neuron, a trace preferably also is provided that is responsive to an input signal at the associated input. The trace preferably includes a first trace count that is a function of associations of the input signal at prior dendrites, and a second trace count that is a function of associations of the input signal at next dendrites. The first and second power series are responsive to the respective first and second trace counts. Similar to the weights, each trace preferably comprises at least one first trace count that is a function of associations of the input signal at prior dendrites, and at least one second trace count that is a function of associations of the input signal at next dendrites. The first and second trace counts also may be represented by a power series.
In order to provide a memorizing operation, the input signal preferably is converted into the first and second trace counts, and a trace wave propagator propagates the respective first and second trace counts into the respective first and second power series of weights. The trace wave propagator preferably propagates the trace along the sequentially ordered dendrites in a forward direction and in a reverse direction. Carry results also preferably are propagated along the power series of weights in the plurality of dendrites to provide memorization of the input signal. A Double Match/Filter preferably identifies carry results for a weight in a dendrite, for propagation to a next higher power weight. The Double Match/Filter also preferably identifies carry results for a weight in a dendrite based upon co-occurrence of a weight and a trace.
In order to provide a reading operation, an accumulator accumulates matches between the first and second trace counts and the first and second power series of weights. The accumulator preferably accumulates matches between the first and second trace counts and all of the counts in the first and second power series of weights, regardless of whether carry results are produced. A summer is responsive to the accumulator, to sum results of the accumulations of matches of the first and second trace counts to the first and second power series of weights.
As described above, the weights preferably include first and second power series that are respective functions of associations of prior dendrites and associates of next dendrites. The association is an example of a statistical function that represents a characteristic of the associations rather than the associations themselves. Preferably, a sum of associations of prior dendrites and a sum of associations of next dendrites is used. However, other statistical functions may be used. It also will be understood that, although the prior/next relationships preferably are used with the power series weights, the prior/next relationships also may be used with conventional neural network weights to provide improved nonlinear interactions between the input nodes of the neural network. Finally, it also will be understood that the present invention may be embodied as systems, methods, computer program products and/or combinations thereof.