There are numerous challenges to designing a speech processing system. Conventional speech processing systems may use a model that includes one or more neural networks or other machine learning algorithm. The performance of these models may depend on the manner in which the model is trained. Training may be dependent on the quality of the training set. For example, conventional training techniques may train the model based on batches of correlated training data. The model, after training is complete, may perform accurately when input data has similar speech properties as the batches of training data. However, humans have a wide range of speaking accents, speaking styles, and/or vocal irregularities. Thus, the input data may have different speech properties than the batches of training data. Performance of the model may degrade based on the different speech properties of the input data. For at least this reason, conventional training techniques may be inadequate.
As another example of the challenges to designing a speech processing system, some speech processing tasks may be to classify speech data by associating the speech data with one or more categories or other type of classification. One example of a classification technique is intent classification. Performance of the classification, however, may be dependent on the amount of noise or errors present in the input. For example, classification may include converting speech data to text data based on an automated speech recognition (ASR) process. The ASR process may have introduced an error such as by recognizing one of the spoken words incorrectly (e.g., the spoken word “night” may have been determined as the word “knight” by the ASR process). This error may cause the determination of an incorrect classification. For at least this reason, conventional classification techniques may be inadequate.