1. Field of the Invention
This invention relates to spoken dialogue systems. In particular, the invention relates to unified client-server distributed architectures for spoken dialogue systems.
2. Description of Related Art
Today, speech is emerging as the natural modality for human-computer interaction. Individuals can now talk to computers via spoken dialogue systems that utilize speech recognition. Although human-computer interaction by voice is available today, a whole new range of information/communication services will soon be available for use by the public utilizing spoken dialogue systems. For example, individuals will soon be able to talk to a hand-held computing device to check e-mail, perform banking transactions, make airline reservations, look up information from a database, and perform a myriad of other functions.
Speech recognition entails machine conversion of sounds, created by natural human speech, into a machine-recognizable representation indicative of the word or the words actually spoken. Typically, sounds are converted to a speech signal, such as a digital electrical signal, which a computer then processes. Most currently commercially-available speech recognition systems include computer programs that process a speech signal using statistical models of speech signals generated from a database of different spoken words. Typically, these speech recognition systems are based on principles of statistical pattern recognition and generally employ an acoustic model and a language model to decode an input sequence of observations (e.g. acoustic signals) representing input speech (e.g. a word, string of words, or sentence) to determine the most probable word, word sequence, or sentence given the input sequence of observations. Thus, typical modern speech recognition systems search through potential words, word sequences, or sentences and choose the word, word sequence, or sentence that has the highest probability of re-creating the input speech. Moreover, speech recognition systems can be speaker-dependent systems (i.e. a system trained to the characteristics of a specific user""s voice) or speaker-independent systems (i.e. a system useable by any person).
Further, there are different types of speech or voice recognition applications. For example, command and control applications typically have a small vocabulary and are used to direct the computer to perform specific tasks. An example of a command and control application would be to direct a computer to look up the address of a business associate stored in memory. On the other hand, natural language processing applications typically have a large vocabulary and the computer analyzes the spoken words to try and determine what the user wants and then performs the desired task. For example, a user may ask the computer to book a flight from Boston to Portland and the computer will determine that the user wants to make an airline reservation for a flight departing from Boston and arriving at Portland and the computer will then perform the transaction to make the reservation for the user.
Unfortunately, existing spoken dialogue systems are typically based on only one of three different types of standard architectures: client-only, server-only, or a client-server architecture. Although each of these types of architectures has certain advantages, each of these architectures also has some disadvantages.