1. Field of the Invention
The present invention relates generally to speech recognition technology and in particular to robust access to large structured data using voice-based form-filling.
2. Introduction
Many web and telephony applications involve retrieval of information from large, structured, databases, using form-filling. A database typically consists of a number of fields. An entry in the database can be retrieved by the user by specifying values for a subset of the fields. In web applications accessed using a desktop computer, entry of fields using a keyboard is simple and accurate. In telephony applications, voice input of fields using automatic speech recognition (ASR) is convenient but error-prone. Every field in a form has to be correctly recognized for a task to be successfully completed. Therefore, recognition accuracy for each field has to be very high.
An acceptable ASR accuracy can be achieved for simple fields such as account numbers, dates, time, etc. However, accurate recognition of names of people or places, airport names, street names, etc., is difficult to achieve if each field is considered individually. There are often strong inter-field constraints which can be exploited to improve ASR accuracy. Simple methods for incorporating these constraints include the construction of a grammar for the complete form, or dynamically constructing grammars for each field constrained by input already provided by the user. These methods can get impractical for forms with many fields and large vocabularies. The above discussion applies not only to information retrieval from databases but also to information input. Consider an application in which the user has to schedule a service visit to an address. The address entry form could be designed to produce only valid addresses as provided by, say, the Postal Service.
There are many user interface issues that also have a significant impact on the success of form-filling. The users could specify either the value of one field, or the values of all the relevant fields, in a single utterance. The first option requires that the user select a field either by voice or multi-modal input. In the second option, the ASR system would have to accept a variety of user responses. Finally, there are memory and CPU constraints that impact the design and performance of form-filling systems.
In view of the above, there is a need for systems and methods for providing voice-based form-filling when conventional approaches are infeasible, such as when a vocabulary and database size are very large.