A Boltzmann machine is a type of neural network comprising symmetrically connected nodes. The nodes are connected by weighted connections and each node has an associated bias value. During a learning process training data is presented to the network and a learning rule is followed to update the weights and bias values. The learning process involves repeatedly updating until the network reaches an equilibrium. Hidden nodes in the network are able to discover interesting features that represent complex regularities in the training data and these hidden nodes are often referred to as “feature detectors”. The hidden nodes are those which are not input nodes or output nodes and there may be many layers of hidden nodes. Nodes which are not hidden are often referred to as visible nodes.
Restricted Boltzmann machines are a type of Boltzmann machine neural network without connections between the visible nodes and without connections between hidden nodes. There are no layers per se in a restricted Boltzmann Machine, but simply input and hidden units. Therefore training of restricted Boltzmann machines is much faster than training of regular Boltzmann machines comprising of connections between visible units and of connections between hidden units. The learning process in this type of neural network is generally much faster than for Boltzmann machines with many layers of hidden nodes.
Boltzmann machines (including restricted Boltzmann machines) are arranged to learn the distribution over the data presented to the visible units. In this way the network forms a representation of the data and hidden nodes in the network come to represent features of the data.
Restricted Boltzmann machines may be stacked in layers, using the hidden nodes of one as input for the next. The activations of the hidden nodes of one RBM may be used as the training data for the next RBM to efficiently learn many hidden layers. The resulting network is referred to as a deep belief network.
Such deep belief networks are used for many applications in data processing and a non-exhaustive list of examples is: data compression; data dimensionality reduction; object recognition; document retrieval; modeling gene expression data; modeling motion capture data; representing complex data.
In general it is desired to provide data processing systems using Boltzmann machines which represent complex data in an accurate and reliable manner and in which training may be carried out quickly, reliably and with stability.
The embodiments described below are not limited to implementations which solve any or all of the disadvantages of known data processing systems which use Boltzmann and restricted Boltzmann machines.