1. Field of the Invention
The present invention is directed to a neural network node, a network and a network optimization method that finds a near optimum network configuration for a problem rapidly and, more particularly, is directed to a node that includes a nonlinear response neuron, such as a Hopfield and Tank neuron, connected to a nonmonotonic neuron and to network architectures which take advantage of this node.
2. Description of the Related Art
Optimization problems consist of two components: a set of constraints and a cost function. The object of such problems, to find configurations of the problem elements which satisfy the constraints and minimize the cost function, becomes a difficult searching task when there are a large number of configurations satisfying the constraints while very few of these are of minimal cost. Fortunately, in many applications, near minimal solutions are usually good enough and hence heuristics can be used. A large number of difficult problems encountered in practical applications, perceptual problems in particular, such as target recognition, are optimization problems.
There has been a proliferation of methods for solving optimization problems using neural networks, most stemming from two fundamental approaches: a continuous deterministic method proposed by Hopfield and Tank; and a discrete stochastic approach called simulated annealing proposed by Kirkpatrick. Most implementations of these methods use some variant of the standard Hopfield neural network, a completely interconnected single layer of neurons with symmetric interconnections. Differences between implementations arise from the use of different types of transfer functions for the neurons and different types of updating for the network. The feature common to all approaches is their treatment of optimization as an energy minimization problem or a problem of finding minima on a high dimensional surface. A mapping of an optimization problem onto a neural network consists of a representation assigning interpretations to particular network states within the context of the problem, and an energy function incorporating the constraints and cost function of the problem associating the lowest energy to network states which both satisfy the constraints and have low cost. Since the contribution of the state of a single neuron to the global energy can be determined locally, based on the energy function, the connection weights between the neurons and the potential functions of the neurons can be designed so that the state of each neuron will only change in a manner which reduces the global energy of the network.
The Hopfield and Tank model embeds discrete problems into a continuous decision space by using neurons with continuous sigmoid type transfer functions. Since only discrete on/off states of the network are given an interpretation in the context of the problem to be solved, the continuous range of states which the network assumes in finding a solution, can be seen as the network searching through multiple discrete states, simultaneously. Comparisons between this and discrete space models where neurons can only assume values +1 or -1 show that the continuous model has vastly greater computational power. Since it is both deterministic and uses continuous time, as do biological neurons, the network is simply modeled with basic electrical components. The major drawback of this method, stemming from its determinism, is the fact that there is no way of avoiding poor local minima on the energy surface in which the network might become trapped.
Simulated annealing is a stochastic optimization method based on an analogy with the physical process of annealing crystals. The random local interactions of large ensembles of molecules which eventually lead the ensemble toward a state which is of globally minimal energy can be mimicked using a network of probabilistic threshold elements. These threshold elements behave so as to decrease the global energy whenever the possibility arises, but also are permitted to assume states which increase the global energy of the network with a probability which is inversely proportional to the size of the increase, and directly proportional to an adjustable parameter called temperature of computation. This network, called a Boltzmann machine, performs a gradient descent but can escape local minima on the energy surface by jumping to higher global energy states, and hence will, if left long enough, reach a minimal energy state. Foremost among the practical difficulties of this method is the problem of reducing the time needed to reach a globally minimal equilibrium. One method is to use cooling of the temperature parameter to speed the process of reaching equilibrium, but this requires finding a cooling schedule (how long to leave the network at each temperature of computation) which is as short as possible but still allows the network to come close to equilibrium at each temperature. Good cooling schedules become increasingly difficult to find empirically and increasingly critical to performance as the size of the network increases. In addition there is little known about the characteristics of optimal cooling schedules or how they depend upon the problem or implementation. It is also suspected that the components of optimization problems do not behave like ensembles of identical elements bringing into question the validity of cooling uniformly. Finally, this model lacks appeal in that discrete nondeterministic neurons do not fit into biological models and the global parameters of temperature and cooling schedule do not easily fit into a distributed representation.