Neural networks have a long history in speech recognition, most notably as acoustic models for hybrid or tandem hidden Markov Model (HMM) systems. The recent introduction of deep networks to hybrid systems has improved results.
Recurrent neural network (RNN)-HMM hybrids have also been proposed but do not currently perform as well as deep feedforward networks. An alternative to HMM-RNN hybrids is to train RNNs directly for speech recognition. This approach exploits the larger state-space and richer dynamics of RNNs compared to HMMs, and makes it possible to use end-to-end training with no predefined alignment between the input and target sequences. Long Short-term Memory is an RNN architecture with an improved memory that has been successful at end-to-end cursive handwriting recognition. However it has so far made little impact on speech recognition.
It is an object of the following to obviate or mitigate at least one of the foregoing issues.