Considerable studies and research for a number of years has continued in the functions of living neurons which are grouped in complex interconnected networks and which can be seen to act as types of problem-solving computer systems.
By posing electronic networks which simulate the interconnected neuronic networks, it has been felt that many problems could be efficiently solved through the computational capabilities required for solving a class of problems which involve "combinatorial complexity".
This class of problems are often found in engineering and commercial ventures from simulation of the problems of perception and recognition which must be handled by the nervous systems of living creatures.
For example, if one is given a map and has the problem of driving from a given source city over to a destination city, which could be considered as the best route to travel? Another common example is the problem of designing circuit boards in regards to where is the best location to put each of the multiple numbers of chips in order that an efficient wiring layout can accompany the components.
These types of analogies and optimizational problems have been described in the publication Biological Cybernetics by J. J. Hopfield and D. W. Tank, published 1985 in Volume 52 at pages 141-152. This article is entitled "Neural Computation of Decisions in Optimization Problems".
Studies of neural architecture indicate that a broad category of parallel organization is in operation rather than a series of step-by-step processing functions as occurs in general purpose computers. If each of the neurons is considered as a simple processing unit, it is seen that these biological systems operate in a "collective" or a "group operative" mode with each individual neuron connected to and summing the inputs of hundreds of other individual neurons in order to determine its own output signal.
The computational load of taking these many sensory inputs and reducing them to a "good" or even an "optimum" solution seems to be one feature of these biological and neuronic system networks. Apparently, a collective solution is computed on the basis of the simultaneous reactions, on each other, of hundreds of neurons (or processing units which simulate them).
Thus, if a multiple number of interconnected neurons (or processors) are fed input signals, each of the individual neurons will provide an output signal at some time according to its own individual input signal and to those parallel interconnected impulses it receives from its neighboring neurons (processors). The overall result of the output signals of the collective group of neurons is a "global output signal" which represents the collective judgement of the neural network.
Modern digital general purpose computers using the standard Very Large Scale Integrated circuitry will generally involve logic gates where each logic gate will obtain inputs from two or three other gates in order to come to certain binary decisions in the course of a computation. However, in the situation of non-linear neural processors (neurons), organized in a collected parallel-processing network, these get inputs from practically all of the other neural processors and then compute a "collective" solution on the basis of simultaneous interactions of the hundreds of units or devices involved.
The neural computational network will be seen to have three major forms of parallel organizations: (i) parallel input channels, (ii) parallel output channels, and (iii) a large amount of interconnectivity between the processing elements (neurons).
The processing components are considered to have "time constants" which provide for the integrated summation of the synaptic input currents from other neurons in the network. In simulation of the biological neuronic network, it is possible to determine the interconnection paths so that they will be of an (i) excitatory or influential nature and (ii) an inhibitory or negative, suppressive nature.
J. J. Hopfield of the California Institute of Technology has shown that the equations of motion for a network (with symmetric connections) will always lead to a "convergence" of "stable states" in which the outputs of all neurons remain constant.
Thus, networks of neurons with this basic organization can be used to compute solutions to specific optimization problems by first choosing connectivities and input bias currents which appropriately represent the function to be minimized.
After the programming of the network is organized, an initial set of input voltages are provided and the system then converges to a "stable state" which minimizes the function. The final "stable state" is interpreted as a solution to the problem. Thus the set of outputs then provides an answer which is considered to represent a solution to the problem.
An example of such a combinatorially complex problem is often cited as the TSP or Travelling Salesman Problem where, for example, it is given that a salesman is to travel to N different cities and then to find out the optimal tour or travel sequence necessary to minimize time and costs.
This type of problem which might normally involve "N-factorial" numbers of computations using ordinary computer networks, is found to come to relative rapid and efficient optimization solutions using the neural parallel-cost-related biological type architecture.
A network organized neuronically for the solution of the Travelling Salesman Problem may be referred to as the "TSP network." Then, to enable the N neurons in the TSP network to compute a solution to the optimization problem, the network is described by an energy function (E) in which the lowest energy state (considered the most stable state of the network) is taken to correspond to the best path or tour.
The concept of "convergence" is used to indicate that the network has settled on a final set of condition-states in each of the neurons (processors) and these states are no longer changing.
Thus, system networks of microelectronic neurons (simulating biological neuronic systems) would appear to rapidly solve difficult optimization problems by the system of "convergence" whereby the states of the neuronic processors involved, have settled down to a "non-changing" minimal energy (E) state.
These systems appear to be uniquely adaptable to the handling of combinatorial optimization problems which involve finding the minimum value of a given function which normally depends on many parameters.
An article in the publication Science on May 13, 1983, in Volume 220, pages 671 through 679, and entitled "Optimization by Simulation Annealing" has compared the analogy of annealing solids as providing a framework for optimization of the properties of large and complex systems.
The subject of combinatorial optimization involves a set of problems which are central to disciplines of computer science. Research in this area aims at developing efficient techniques for finding minimum or maximum values of a function having many independent variables. The TSP problem belongs to this area of research and is often used and tested for results using experimental procedures.
As indicated by the authors Kirkpatrick, Gelatt, Jr., and Vecchi in the Science article, there are two basic strategies in the heuristic methods of problem solving. The first may be called (i) "divide-and-conquer" and the second (ii) may be called "iterative improvement".
In the first strategy, (i), one divides the problems into subproblems of manageable size, then solves the subproblems, then the solutions to each of the subproblems must be patched together in order to produce an overall solution.
In the second strategy, (ii), of iterative improvement, one starts with the system in a known configuration and then a standard "rearrangement operation" is applied to all parts of the system, in turn, until a rearranged configuration that improves the cost or energy function (E) is discovered. The "rearranged configuration" becomes the new configuration of the system and the process is continued until no further improvements can be found.
It may be noted in the strategy (ii) that this search may sometime get stuck in a "local minima" which is not yet a "global optimum", and thus it is customary to carry out this process several times while initially starting from different, randomly selected configurations, and then to go ahead and save the best result.
Condensed matter physics is a body of methods for analyzing aggregate properties of the large number of atoms to be found in samples of liquid or solid matter. This physics uses statistical mechanics as a central discipline.
Because of the tremendous numbers involved, such as, for example, wherein the number of atoms involved is on the order of 10.sup.23 per cubic centimeter, then only the "most probable" behavior of the system in thermal equilibrium, at a given temperature, is observed in experiments.
A fundamental question concerns what happens to this type of system in the limit of low temperature--for example, whether the atoms will remain fluid or will solidify, and if they solidify, will they form a crystalline solid or a glass.
Experiments that determine the low-temperature state of a material, for example, by growing a single crystal from a melt--are done by slow, careful annealing. This is done by first melting the substance, then lowering the temperature slowly, and spending a long time at temperatures in the vicinity of the freezing point.
This is comparable to the previously mentioned condition of lowering the energy level (E) in order to find the optimum condition of the network.
Finding the low-temperature state of a system when a prescription for calculating its energy (E) is given, simulates an optimization problem which is not unlike those encountered in combinatorial optimization.
When applied to the TSP, the "cost function" can be looked at as playing the role of the energy state (E).
Using the "cost function" in place of the energy (E) and defining the configurations by a set of parameters, it is common to use a developed algorithm to generate a population of configurations of a given optimization problem at some effective temperature. This temperature is simply a control parameter in the same units as is the cost function.
Now, the "simulated annealing process" consists of:
(1) Melting the system being optimized at a high effective temperature;
(2) Lowering the temperature by slow stages until the system "freezes";
(3) Noticing that no further changes are occurring.
At each temperature, the simulation must proceed long enough for the system to reach a "steady state". The sequence of temperatures and the number of rearrangements of the parameters attempting to reach equilibrium at each given temperature, can be considered an "annealing schedule". This type of schedule can be applied to an electronic network which simulated neuronic networks.
The use of parallel networks having different levels of "connection strengths" between each of the networks is discussed in an article, "Boltzmann Machines: Constraint Satisfaction and Networks that Learn", published in May 1984 by authors Hinton, Sejnowski, and Ackley through the offices of the Department of Computer Science, Carnegie-Mellon University and designated as technical report CNU-CS-84-119, and which was later published in the magazine Cognitive Science, Volume 9, 1985, pages 146-169 under the title of "A Learning Algorithm for Boltzmann Machines". This involved the study of "connectionist" systems that store their long-term knowledge as the strengths of the connections between simple neuron-like processing elements. These networks are apparently suited to tasks like visual perception which can be performed efficiently in parallel networks and which have physical connections in just the places where processes need to communicate. Included in the Technical Report were the observations on a parallel constraint satisfaction network called a "Boltzmann Machine" which is apparently capable of learning the underlying constraints that characterize a "domain of information" simply by being shown examples of information from the particular domain. The network modifies the strength of its "connections" so as to construct an internal generative model that produces examples with the same probability distribution as the examples it is shown.
The Boltzmann Machine is composed of computing elements called "units" that are connected to each other by "bidirectional links". A unit is always in one of two states--"on" or "off"--and it adopts these states as a probabilistic function of the states of its neighboring units on the "weights" on its links to them. These "weights" can take on real values of either sign.
A "unit" being "on" or "off" is taken to mean that the system currently accepts or rejects some elemental hypothesis about the domain of information. The "weight" on a link represents a weak pairwise constraint between two hypotheses. A "positive weight" indicate that the two hypotheses tend to support one another, if one is currently accepted, then accepting the other should be more likely. Conversely, a "negative weight" suggests that, other things being equal, then the two hypotheses should not both be accepted.
One apparent aspect of the Boltzmann Machine arrangement is that it leads to a domain-independent learning algorithm that modifies the connection strengths between units in such a way that the whole network develops an internal model which captures the underlying structure of its environment.
The learning algorithm presupposes that the network reaches "thermal equilibium" and that it uses the co-current statistics, measured at equilibrium, to create an energy landscape that models the structure of the ensemble of vectors produced by the environment. At the same time it should be noted that there is nothing in this learning algorithm to prevent it from creating an energy landscape that contains large energy barriers which then prevent the network from reaching equilibrium.
Neurons are recognized as complex biochemical entities and it is not considered that these simple binary units, such as the Boltzmann Machine system, are a full representation or simulation of the actual models of neuronic networks. However, the assumption is used that the "binary units" change state "asynchronously" and that they use a probabilistic decision rule.
The "energy gap unit" for such a "binary unit" seems to play a role similar to that played by membrane potential for a neuron. Both of these are the sum of the excitatory and inhibitory inputs and both are used to determine the output state.
The "energy gap" represents the summed output from all the recent active binary units. If the average time between updates is identified with the average duration of the neuron's "post-synaptic potential", then the binary pulse between updates can be considered to be an approximation to the post-synaptic potential. The sum of large number of stochastic pulses is independent of the shape of the individual pulses and depends only on their amplitudes and durations. Thus, large networks can act to provide the "fan-in" effect which may be typical of the average cerebral cortex.
"Random asymmetries" or "noise" in the system would appear to be reduced through the hierarchical structure providing the "fan-in".
It is also considered that there are certain effects in the biological nervous system called "time-delays", but that these be considered to act like added "noise". It is considered that the two main ideas that led to the Boltzmann Machine are that: "noise" can help with the searching computation process; and that Boltzmann distributions make it possible to assign credit on the basis of "local" information in a non-linear network.