1. Technical Field
This invention relates to the field of natural language understanding, and more particularly, to an integrated development tool for building a natural language understanding application.
2. Description of the Related Art
Natural language understanding (NLU) systems enable computers to understand and extract information from human speech. Such systems can function in a complimentary manner with a variety of other computer applications, such as a speech recognition system, where there exists a need to understand human speech. NLU systems can extract relevant information contained within text and then supply this information to another application program or system for purposes such as booking flight reservations, finding documents, or summarizing text.
Currently within the art, NLU systems employ several different techniques for extracting information from text strings, where a text string can refer to a group of characters, words, or a sentence. The most common technique is a linguistic approach to parsing text strings using a context free grammar, commonly represented within the art using Backus-Naur Form (BNF) comprising terminals and non-terminals. Terminals refer to words or other symbols which cannot be broken down any further, whereas typically, non-terminals refer to parts of speech or phrases such as a verb phrase or a noun phrase. Thus, the grammatical approach to NLU seeks to parse each text string based on BNF grammars without the use of statistical processing.
To build such a grammar based NLU system, a linguist is typically required, which can add significant time and expense to application development. The quality of an NLU application, however, can be unsatisfactory due to the difficulty of predicting each potential user request or response to a prompt, especially in relation to a telephonic conversational style. Notably, such unsatisfactory results can occur despite the use of a linguist.
Another technique used by NLU systems to extract information from text strings is a statistical approach where no grammar is used in analyzing the text string. Presently such systems learn meaning from a large corpus of annotated sentences. The annotated sentences are collected into a corpus of text which can be referred to as a training corpus. The tools used to develop statistical NLU systems and annotate text have included such disparate elements as ASCII files, conventional text editors, and keyboard macros. Using these inefficient tools, word relationships can be specified and a statistical model can be built. Thus far, however, an efficient and accurate graphical visual editing tool has yet to be developed. In consequence, the development of statistical NLU applications typically has been reserved for trained experts.
Another disadvantage of using conventional NLU application development tools is that development in a team environment can be difficult. Notably, because existing tools make use of disparate components, such development tools are unable to track or flag changes made by one team member to prevent another team member from overwriting or re-annotating the same portion of text. Moreover, conventional development tools cannot identify the situation wherein multiple instances of a particular sentence within the training corpus have been annotated in a manner that is inconsistent with one another.