Spoken language is the most natural and convenient communication tool for people. With data storage capacities increasing rapidly, people tend to store greater amounts of information in databases. Accessing this data with spoken language interfaces offers people convenience and efficiency, but only if the spoken language interface is reliable. This is especially important for applications in eye-busy and hand-busy situations, such as driving a car. Man-machine interfaces that utilize spoken commands and voice recognition are generally based on dialog systems. A dialog system is a computer system that is designed to converse with a human using a coherent structure and text, speech, graphics, or other modes of communication on both the input and output channel. Dialog systems that employ speech are referred to as spoken dialog systems and generally represent the most natural type of machine-man interface. With the ever-greater reliance on electronic devices, spoken dialog systems are increasingly being implemented in many different machines.
Speech recognition processes involve the conversion of spoken acoustic signals into words or sets of words. Digitized speech signals are transformed into sets of useful measurements or features at a fixed rate. These features are then used to search from most likely word candidates through the use of constraints imposed by acoustic, lexical, and language models. At the acoustic phonetic level, speaker variability is usually modeled using statistical techniques applied to large amounts of data. Automatic speech recognition (ASR) algorithms generally use statistical and structural pattern recognition techniques and/or knowledge based (phonetic and linguistic) principles. ASR systems can be based on methods in which entire words or sentences (segments) are directly recognized, or in which an intermediate phonetic labeling method is used before a lexical search.
Speech recognition systems make extensive use of training data to build a database of recognized words. The data in this application typically refers to text data. In many dialog applications, statistical models must be trained for different modules in the dialog system. In order to train a proper statistical model, a large amount of labeled training data is often needed. Training data is labeled in terms of certain syntactic and/or semantic information. Obtaining a sufficiently large amount of labeled training data is time-consuming, labor-intensive, and costly. Present known labeling methods typically involve manually labeling each and all sentences in a data set. For large data sets, this can involve a great deal of cost and effort.
What is needed, therefore, is a training data labeling system that is efficient and cost-effective.