Neural networks are commonly used in information classification systems, such as speech recognition systems for phoneme recognition. In one approach, an Artificial Neural Network (ANN) classifier is combined with a Hidden Markov Model (HMM) to transform network classifications into labeled sequences. The HMM is typically used to model the long range sequential structure of the data while the ANN is used to provide localized classifications. The use of an HMM model, however, requires unnecessary assumptions about the data. A Recurrent Neural Network (RNN) may also be combined with HMMs to label input sequences, but traditional approaches fail to exploit the full potential of RNN for modeling sequential data.
Further, many approaches are often highly complex and may not be practical for use in applications with memory, power and processing limitations, such as mobile telephones and other low power devices. Efforts to reduce complexity often come at the cost of less flexibility, memory inefficiencies, and other undesirable performance measures. In view of the foregoing, there is a need in the art for solutions to optimize information classification systems for training neural networks that are both fast and resource efficient.