Building speech recognition applications can be a time consuming process. Development of natural language understanding (NLU) systems, in particular, can be one of the most challenging aspects of developing speech applications. Such development may involve the use of special linguistic and software development skills. Development of natural language understanding systems may also rely on grammars written manually and statistical models that are trained on large quantities of manually-annotated text. Manual annotations may be time-consuming and error-prone that can increase development time and affect the quality of the grammars and statistical models produced.
Various approaches to enhance the development of grammars and statistical models have been attempted. With respect to grammar development, for example, some conventional approaches might utilize unsupervised grammar induction techniques, integrated development environments (IDEs) for authoring grammars, or graphical user interface, i.e., GUI-based interactive tools as alternatives to manual grammar creation.
Unsupervised grammar induction techniques, however, do not provide meaningful grammar rules that are readable to humans and also require further manual tuning. Although IDEs may be useful in testing grammars, debugging grammars, and visualizing parsing results, such IDEs do not provide suggestions regarding the target grammar structure, grammar rules, or the ordering of such rules. Instead IDEs delegate these tasks to the user. While GUI-based interactive tools may be helpful in guiding a user through an annotation process, such tools use complicated statistical and lexicalization models (e.g., hidden Markov models and context-free grammars). As a result, such GUI-based tools require significant amounts of effort to properly define slot fillers based on regular expressions.
Improved approaches to developing natural language understanding systems and annotating text samples have been described. The disclosures set forth in further detail below describe additional improvements to the development of natural language understanding system and text annotation processes.