Remote devices such as portable devices have gained wide acceptance throughout many modern societies. Portable devices include but are not limited to, cell phones, pagers, personal digital assistants (PDAs), portable global positioning devices, and networked systems within automobiles. Although such portable devices are initially designed to perform predefined tasks that are often limited to a narrow application, it is also envisioned that such devices may take on additional tasks such as accessing the World Wide Web (WWW).
However, the very nature of a portable device is its convenience which typically requires the portable device to be relatively small in physical size. Unfortunately, such requirements often constrain the processing power and the characteristics of input/output interfaces on a portable device. For example, it is generally impractical to provide a physical keyboard on a cell phone. Although an electronic keyboard can be displayed on a screen as in a PDA, such a user interface is unwieldy in performing complex tasks. Additionally, the user may be distracted while operating the portable device, e.g., while operating a vehicle.
Thus, a speech-driven user interface is desirable in a portable device. However, speech recognition systems are designed to undertake the difficult task of extracting recognized speech from an audio signal, e.g., a natural language signal. The speech recognizer within such speech recognition systems must account for diverse acoustic characteristics of speech such as vocal tract size, age, gender, dialect, and the like. Artificial recognition systems are typically implemented using powerful processors with large memory capacity to handle the various complex algorithms that must be executed to extract the recognized speech.
Unfortunately, the processing demands of speech recognition and speech processing often exceed the processing capability of current portable devices. Although the portable devices may have sufficient processing power to perform a small portion of the total functions of a full-blown speech recognition system, it is often difficult to ascertain in advance which tasks and associated data a portable device will need to perform a particular task. For example, the resources and processing cycles necessary to perform a speech-driven command to locate a particular web page on the Internet via a PDA is quite different than a speech-driven command to dial a preprogrammed phone number on a cellular phone system in an automobile.
Additionally, a user who is interfacing with an automatic speech recognition system must often operate within a rigid and idiosyncratic structure. Namely, a particular spoken language application (SLA) is often tailored to handle a particular task or topic, e.g., an SLA with a grammar tailored for handling restaurant requests, an SLA with a grammar tailored for handling flight information for an airport, and so on. In order for a user to switch topic, it is often necessary to inform the system in a distinctive and abrupt manner so that the system understands that the user wishes to switch to a new topic. For example, the user may have to utter “I would like information on restaurants now, are you ready?” or “Stop, new topic on restaurants”. Such rigid rules degrade the overall user's experience with the speech recognition system.
Therefore, a need exists for a fast and computationally inexpensive method that allows speech-driven control and remote access of information and services, where a change in the topic or a change in the intent of the user is detected seamlessly without the user having to inform the system of his or her intention.