Neural networks attempt to model or replicate human thought processes in programmed logic executable by a computer. Neural networks, are typically employed in pattern matching such as speech and facial recognition. Generally, results are obtained in terms of likely candidates or matches, rather than conventional programmed logic that responds rigidly to deterministic information. Stochastic values inject an element of probability or randomness for allowing neural networks to arrive at a “most likely” conclusion to complex analytical tasks.
Deep belief networks are probabilistic models that are composed of multiple layers of stochastic, latent variables. The latent variables typically have binary values and are often called hidden units or feature detectors. The top two layers have undirected, symmetric connections between them and form an associative memory. The lower layers receive top-down, directed connections from the layer above. The states of the units in the lowest layer represent a data vector.
Significant properties of deep belief networks include an efficient, layer-by-layer procedure for learning the top-down, pre-trained weights that determine how the variables in one layer depend on the variables in the layer above. After learning, the values of the latent variables in every layer can be inferred by a single, bottom-up pass that starts with an observed data vector in the bottom layer and uses the weights in the reverse direction.
Further, pre-training of deep belief networks occurs one layer at a time by treating the values of the latent variables in one layer, when they are being inferred from data, as the data for training the next layer. This efficient, so-called “greedy” learning can be followed by, or combined with, other learning procedures that fine-tune all of the weights to improve the generative or discriminative performance of the whole network.
Discriminative fine-tuning of deep belief networks can be performed by adding a final layer of variables that represent the desired outputs and backpropagating error derivatives. When networks with many hidden layers are applied to highly-structured input data, such as images, backpropagation works much better if the feature detectors in the hidden layers are initialized by learning a deep belief network that models the structure in the input data
Conventional language processing receives user speech and processes the received voice signals into text, typically represented as an alphanumeric string (text) of characters in a target language for which the language processing application is configured. Language processing may be employed in a variety of contexts by supplementing or replacing conventional keyboard input with a speech recognition component or module for converting speech into text. Speech recognition capabilities therefore accompany other production applications for providing an alternate input path to allow spoken commands and data as an alternative to manual keyboard entry. The speech recognition component executes as a language processing application in communication with the production application for which they perform the speech recognition.