Initially, language understanding systems were built for single modal applications (e.g. voice over the phone contact centers). Recently, language understanding systems are built for multimodal applications (e.g. entertainment content search), where the system can respond to the user in different modalities (e.g. voice-out/text-out/UI, and the like). Language understanding systems may use a set of models which are trained using various machine learning techniques. The typical model set contains domain, intent and slot models. These models may be trained using such techniques as Support Vector Machine (SVM), Boosting, Maximum Entropy Models, Conditional Random Fields (CRMs), Neural Networks, Deep Belief Networks, and the like. These techniques use labeled data to learn the discrimination between various intents for the intent prediction (and various domains for the domain prediction/various slots for the slot tagging). The parameters of these models are learned by minimizing the error for various objective functions. These objective functions tend to be functions of the errors (i.e. the difference between predicted and true label). These models are trained both to predict the reference labels and also to discriminate between these labels.