1. Field of the Invention
The invention relates in general to the field of neural networks, and more particularly, to implicit general computation by means of digital applications or by means of dynamic systems with underlying fractal attractors.
2. Description of the Related Art
Neural networks of some description have been discussed in open literature since the 1960's. A few related patents (e.g., Dunn et al., U.S. Pat. No. 3,275,986) date from that period. Despite much progress on the creation of distributed memory, robust function and pattern recognition, early work was largely abandoned after Marvin Minsky's famous attack on the Perceptron, which he showed to be incapable of the “exclusive or” (XOR) logical function. Interest in this type of computation device revived after the discovery that the XOR limitation could be overcome by adding additional layers to the Perceptron architecture, sometimes called “hidden” layers.
In the past 25 years, the U.S. has issued more than 4,000 patents for, or including components of, neural networks. The first of these (Cooper et al., U.S. Pat. No. 3,950,733, and Cooper et al., U.S. Pat. No. 4,044,243), a three layer design, claimed adaptive information processing modules that mapped N input values to n output values, and thus involved neural networks performing general computation functions. However, since that time, only 11 percent of the U.S. patents related to neural networks concern general computation or generalized computational functions. More than 80 percent focus on specific applications employed on particular functions. Typically, these applications involve some kind of dynamic control (e.g., Grossberg et al., U.S. Pat. No. 4,852,018), pattern recognition (e.g., Loris et al., U.S. Pat. No. 4,876,731), or image or signal processing (e.g., Provence, U.S. Pat. No. 4,885,757). The remaining patents concern components used in neural networks, either for general computation, or for specific applications (e.g., Kesselring, U.S. Pat. No. 4,896,053).
Neural networks that perform either generalized or applied functions share a number of common traits, notably an architecture of computational nodes arrayed into one or more layers, connection weights, learning rules, and procedures to adjust parameters to expedite the learning process. More than 90 percent of these designs are supervised, meaning that the design requires a formal training phase where output values are “clamped” to a training set of data. Perceptrons provide an early example of such supervised networks and still provide useful applications (e.g., Adler, U.S. Pat. No. 5,261,035). Perceptrons alone comprise more than 10 percent of the patented designs since 1976.
More than one third of all neural network design abstracts describe the use of backpropagation procedures. These provide an example of explicit computation in neural network functions, as backpropagation consists of employing the execution of an optimization algorithm to adjust the weights in a neural network architecture as the network is trained (e.g., Martin, U.S. Pat. No. 5,440,651).
Backpropagation differs from feedback, which is simply the provision of inputs from other portions of an architecture, or from the environment. Backpropagation differs fundamentally from implicit computation, which occurs in networks that employ local rules to accomplish their tasks. Explicit computation in this sense describes the use of an external calculation to further the functioning of a network. The network in effect provides a framework in which the calculations, such as multi-variate Taylor expansions in the Cooper et al., designs above, are completed. Implicit computation does not need such external calculations. The contrast between implicit and explicit computation is quite similar to the distinction between simulation and emulation of dynamic systems discussed in David L. Cooper, Linguistic Attractors, Chapter 2 (1999): a simulation attempts to capture important aspects of dynamics, while an emulation attempts to match results without reference to internal dynamics (essentially a “black box” approach).
Two important classes of neural networks that normally rely on explicit calculation are the hidden Markov models and simulated annealing models. Hidden Markov models employ calculations based on a selected probability distribution to adjust the network as it trains. These distributions are stationary, that is, the probability of an event is the same at time t as at time t+Δt. For example, Brown et al., U.S. Pat. No. 5,805,832, uses a hidden Markov step and a Poisson distribution for some applications. Abe, U.S. Pat. No. 5,940,794, includes a hidden Markov step and mentions the gamma distribution in one embodiment (the gamma distribution corresponds to the distribution of waiting times for Poisson processes). Gong, U.S. Pat. No. 6,151,573, uses a hidden Markov step with combinations of Gaussian (normal) distributions. Hidden Markov models account for more than 9 percent of the U.S. patents issued in the past quarter century.
Simulated annealing designs (e.g., Leuenberger, U.S. Pat. No. 6,100,989)—at least another 6 percent of issued U.S. patents—are particularly suited to explicit calculation, as such designs incorporate a “temperature” parameter that adjusts the speed at which components change their weights. These are typically also associated with another probability distribution for which temperature is an important parameter: this is the Boltzmann distribution, which allows such designs to emulate thermodynamic systems. Implicit versions of simulated annealing are possible, for example, Alspector, U.S. Pat. No. 4,874,963, implements the Boltzmann distribution with semi-conductor circuits, and uses a source of noise to adjust the “temperature” parameter.
Synthetic evolutionary designs comprise another 9 percent of issued U.S. patents. These (e.g. Parry et al., U.S. Pat. No. 6,047,277) use a version of a “genetic algorithm” to produce random outputs, and then cull the outputs according to some metric. For example, Altshuler et al., U.S. Pat. No. 5,729,794 uses such an algorithm to produce antenna designs, where computer estimates of antenna characteristics are weighed against a set of desired characteristics.
While neural network designs requiring explicit computation are very common, implicit designs, such as Alspector's cited above, are rare. Cooper, U.S. Pat. No. 6,009,418, to which this application claims priority, is a clear example of this kind of design. It discloses an architecture that permits self-adjusting channels which already provides at least 26 percent improvement in digital simulations over other designs on deeply-nested dependency problems. It also incorporates learning rules based on non-stationary processes that permit non-digital implementations through dynamic systems characterized by such non-stationary processes, such as systems described by Bose-Einstein statistics. Finally, it discloses a network design that can exploit the capability of fractal sets to encode and process information.
The present disclosure, in expanding on Cooper, U.S. Pat. No. 6,009,418, employs three key concepts that do not appear elsewhere in the prior art in the senses meant here: fractal sets, renormalization, and percolation. In the prior art, these terms are used in the following manner.
Except in Cooper, U.S. Pat. No. 6,009,418, “fractal” appears in three principal senses: as a method for data compression (e.g., Hoffberg et al., U.S. Pat. No. 6,081,750), in the related sense in which it appears as an alternative method to re-construct a figure (e.g., Kostrzewski et al., U.S. Pat. No. 6,167,155), and as a physical description, particularly as a texture (e.g., Nelson et al., U.S. Pat. No. 6,052,485).
“Renormalization” occurs in the sense of a calculation step to bring values back into a specified range or re-scaling it (e.g., Gulati, U.S. Pat. No. 6,142,681 and McCormack, U.S. Pat. No. 5,265,192). In a minor exception, Barrett, U.S. Pat. No. 5,602,964, notes that the disclosed process involving Liapunov exponents in that patent is “compatible” with renormalization group methods from statistical physics. Such methods are normally employed to derive gauge invariance in various systems.
“Percolation” occurs most often as a parameter a given design can compute as part of its output (e.g., Lewis et al., U.S. Pat. No. 5,698,089). Bozich et al., U.S. Pat. No. 5,367,612 uses “back percolation” in the sense of backpropagation. Colak, U.S. Pat. No. 5,706,404, discloses a network design that uses inhomogeneities in a medium to transmit input signals as unchannelled waves. Colak comes closer to the sense employed in the present disclosure but stops short by using the percolation concept simply as a way to understand the process in that disclosure. The disclosure notes, for example, that there is no sharp cut-off in current such as a real percolation model would predict. Klersy et al., U.S. Pat. No. 5,536,947 describes a memory device that employs a material that changes back and forth between amorphous and crystalline states to store and retrieve information. They note that percolation takes place across the material in these switches. While memory is an important component to general computation, this disclosure does not take the next step and describe how such a process can be used to perform computations in general.