General learning algorithms for multi-layer neural networks have only recently been studied in detail. Recent work in this area began with the multi-layer Boltzmann machines. Hinton, G. E., and Sejnowski: "Optimal Perceptual Inference," Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, 448-453 (1983), and Hinton, G. E., Sejnowski, T. J., Ackley, D. H.: "Boltzmann Machines: Constraint Satifaction Networks that Learn", Tech. Report CMU-CS-84-119, Carnegie-Mellon University (May 1984). Hinton and Sejnowski found that a learning algorithm for multi-layer networks could be described for their system. However, the low rate of convergence of the synaptic state of this system led them, and others to look for alternative multi-layer learning systems. Rumelhart, Hinton and Williams found that a generalization of the delta rule could describe learning in multi-layer feedforward networks. Rumelhart, D. E., Hinton, G. E., and Williams, R. J.: "Learning Representations by Back Propagating Errors", Nature, 323, 533-536 (1986). This delta rule was independently developed by Parker. Parker, D. B.: "Learning-logic (TR-47)," Massachusetts Institute of Technology, Center for Computational Research in Economics and Management Science, (1985). This system, now often called "Back Propagation", is much faster than the Boltzmann machine and is able to automatically acquire internal synaptic states which seem to solve many of the classic toy problems first posed by Minsky and Papert. Minsky, M. and Papert, S.: Perceptrons, MIT Press (1969). These complex internal states have been called "internal representations" of the pattern environment. Rumelhart, D. E., Hinton, G. E., and Williams, R. J.: "Learning Internal Representations by Error Propagation," in D. E. Rumelhart and J. L. McClelland (Eds.) Parallel Distributed Processing, MIT Press, 318-364 (1986). However, it has been found that the convergence rate of the synaptic state of this system goes much slower than linearly with the number of layers in the network. Ballard, D. H.: "Modular Learning in Neural Networks," Proceedings of the Sixth National Conference on Artificial Intelligence, 1, 279-284 (1987). This property has been called the "scaling problem" since it appears to be a significant limitation on the scaling of such networks to large, real-world, problems. Hinton, G. E., and Sejnowski: "Neural Network Architectures for AI," Sixth National Conference on Artificial Intelligence, Tutorial Program MP-2(1987). In the aforementioned article on "Modular Learning", Ballard proposed a method for handling the scaling problem by stacking auto-associating units one on the other. This method violates the feedforward architecture, but the system does appear to reduce the multi-layer learning time.
The Boltzmann machine was an extension of the work of Hopfield who had been studying single layer recurrent networks. Hopfield, J. J.: "Neural Networks and Physical Systems with Emergent Collective Computational Abilities," Proc. Natl. Acad. Sci. U.S.A. 79, 2554-2558 (April 1982), and Hopfield, J. J.: "Neurons with Graded Response Have Collective Computational Properties Like Those of Two-State Neurons," Proc. Natl. Acad. Sci. U.S.A. 81, 2088-3092 (May 1984). Hopfield introduced a method for the analysis of settling of activity in recurrent networks. This method defined the network as a dynamical system for which a global function called the "energy" (really a Liapunov function for the autonomous system describing the state of the network) could be defined. This energy then contained fixed points in the system state space. Hopfield showed that flow in state space is always toward the fixed points of the dynamical system if the matrix of recurrent connections satisfies certain conditions. With this property, Hopfield was able to define the fixed points as the sites of memories of network activity.
Like its forerunners, the Hopfield network suffered from limitations in storage capacity. The degradation of memory recall with increased storage density is directly related to the presence in the state space of unwanted local minima which serve as basins of flow.
Bachmann, Cooper, Dembo and Zeitouni have studied a system not unlike the Hopfield network; however, they have focused on defining a dynamical system in which the locations of the minima are explicitly known, and for which it is possible to demonstrate that there are no unwanted local minima. Bachmann, C. M., Cooper, L. N., Dembo, A., Zeitouni, O.: "A Relaxation Model for Memory with High Density Storage," Proc. Natl. Acad. Sci. U.S.A., Vol. 84, No. 21, pp. 7529-7531 (1987). In particular, they have chosen a system with a Liapunov function given by ##EQU1## where .mu. is the N-dimensional state vector and X.sub.j is the jth memory in the N-dimensional state space. This N-dimensional Coulomb energy function defines exactly m basins of attraction to the fixed points located at the charge (memory) sites x.sub.j. It can be shown that convergence to the closest distinct memory is guaranteed, independent of the number of stored memories m, for proper choise of N and L. Dembo, A., Zeitouni, O.: "ARO Technical Report," Brown University, Center for Neural Science, Providence, R.I. Bachmann et al. have employed a network implementation of this system which is strictly local however, and does not have some of the desirable characteristics of the distributed Hopfield representations.