1. Field of the Invention
The present invention relates to functions of a telecommunication network, and more particularly, to a method and system for facilitating restoration of a voice command session with a user after a system disconnect.
2. Description of Related Art
A voice command platform provides an interface between speech communication with a user and voice command applications. Generally, a person can call the voice command platform and speak commands to the voice command platform. The voice command platform may detect certain commands spoken by the user and responsively provide responses to the person based on the commands spoken.
The voice command platform is communicatively coupled to a voice command application server (“application server”). The voice command application server may be an entity physically separate from the voice command platform or an entity integrated into the voice command platform. The application server stores and/or generates logic of a voice command application. The logic defines prompts to be spoken to a user and acceptable responses to be spoken by the user, in accordance with the voice command applications. The logic is executable by the voice command platform so as to allow the voice command platform to “speak” with the user.
For instance, a person may call the voice command platform and the voice command platform may speak to the user, “Hello. Would you like to hear a weather forecast, sports score, or stock quote?” In response, the person may state to the voice command platform, “weather forecast.” The voice command platform may detect the person's response and signal the application server to load a weather forecasting application. Then, the application server may send, to the voice command platform, logic of the weather forecasting application. The voice command platform may execute the logic, causing the voice command platform to speak another speech prompt to the person, such as “Would you like to hear today's weather or an extended forecast?” The person may then respond, and the voice command platform may further execute the logic sent to the voice command platform or signal the application server to send additional logic to the voice command platform.
Therefore, the voice command platform and the application server, collectively, should be able to (i) receive and recognize speech spoken by a user; and (ii) provide speech to a user. The voice command platform and the application server can achieve these functions in various ways.
On an incoming side, for instance, the voice command platform may include an analog-to-digital (A-D) converter. The A-D converter converts an analog speech signal from a user into a digitized incoming speech signal. The voice command platform may also include a speech recognition (SR) engine. The SR engine will typically be a software module which functions to analyze the digitized incoming speech signal and to identify words in the speech.
As noted above, the logic of the voice command application defines what responses a user can speak in response to a prompt. The responses that a user can speak in response to a prompt may take the form of acceptable “grammars.” The application server will send the logic to the voice command platform. In turn, the voice command platform will execute the logic so that the SR engine can identify one of the possible spoken responses defined by the acceptable grammars.
The SR engine will typically include or have access to a dictionary database of “phonemes” (small units of speech that distinguish one utterance from another). The SR engine will analyze a waveform represented by an incoming digitized speech signal and, based on the dictionary database, will determine whether the waveform represents particular grammars defined by the logic. For instance, if the logic sent by the application server indicates that the user should respond to a prompt with the grammars “sales”, “service”, or “operator”, then the SR engine may identify a sequence of one or more phonemes that make up each of these grammars, respectively. The SR engine may then analyze the waveform of the incoming digitized speech signal in search of a waveform that represents one of those sequences of phonemes. If the SR engine finds a match, then the voice command platform further executes the logic already sent by the application server to the voice command platform. Alternatively, the voice command platform may signal the application server to send additional logic, based on the response spoken by the user.
Additionally, the SR engine or an ancillary module in the voice command platform will typically function to detect dual-tone multi-frequency (DTMF) tones dialed by a user. For instance, the logic of the voice command application might define a DTMF grammar as an acceptable response by a user. The voice command platform may execute the logic of the voice command platform upon the SR engine or the ancillary module detecting the DTMF grammar.
On an outgoing side, the voice command platform may include a text-to-speech (TTS) engine. The TTS may function to convert text into outgoing digitized speech signals. In turn, the voice command platform may include a digital-to-analog (D-A) converter for converting the outgoing digitized speech signals into audible voice to be communicated to the user.
Thus, the application server may specify to the voice command platform the logic in the form of text that represents the audible voice to be spoken to a user. In turn, the voice command platform may execute the logic by passing the text to the TTS engine. The TTS engine will then convert the text to the outgoing digitized speech signal. The voice command platform converts the signal into the audible voice to be spoken to the user.
Also on the outgoing side, a voice command platform may include a set of stored voice prompts, in the form of digitized audio files (e.g., *.wav files for instance). These stored voice prompts would often be common prompts, such as “Hello”, “Ready”, “Please select from the following options”, or the like. Each stored voice prompt might have an associated label (e.g., a filename under which the prompt is stored). The application server might send logic specifying that the voice command platform should speak a particular prompt to a user, identified by the associated label. The voice command platform may responsively retrieve the audio file, convert the audio file into an analog waveform, and send the analog waveform to the user.
The logic that the application server serves to the voice command platform may reside permanently on the application server. Alternatively, the voice command application may be loaded dynamically into the application server. For instance, the application server can be communicatively coupled to a storage medium where various voice command applications reside. When a user calls the voice command platform, the voice command platform can signal the application server to load a voice command application from the storage medium and send logic of the voice command application to the voice command platform. Further, a response from the user may cause the voice command platform to signal the application server to load another voice command application and send logic of the other voice command application to the voice command platform. In this way, a user can navigate through a series of voice command applications and menus in the various voice command applications, during a given voice command session.
The voice command application can be written or rendered in any of a variety of computer languages. One such language is VoiceXML (or simply “VXML”). VXML is a tag-based language similar to HTML that underlies most Internet web pages. Other analogous languages, such as SpeechML, VoxML, or SALT, for instance, are available as well.
An application developer can write a voice command application in VXML, Speech ML, VoxML, or SALT. Alternatively, an application developer can write a voice command application in a programming language such as Java, C, C++, etc. The application server may serve to the voice command platform the logic of the voice command application in the programming language written by the application developer. In turn, the voice command platform or some intermediate entity could transcode the logic from the programming language written by the application developer into VXML, Speech ML, VoxML, or SALT.
In at least VXML, a document may encapsulate the logic of the voice command application. The application server may serve the logic of the voice command application by serving documents that encapsulate the logic. Additionally, the documents have navigation points. The navigation points allow the voice command platform and the application server to identify documents within the voice command applications and/or menus items within the voice command applications. Each navigation point may have a respective identifier or label. For example, a voice command application can define a number of successive menus through which a user can browse, and each menu might have a respective label by which it can be referenced. The voice command platform and the application server can use these labels to move from document to document or from menu item to menu item, just as hyperlinks operate to cause a browser to move from one web page (or component of one web page) to another.
For instance, the identifier or label of the navigation point may take the form of a Universal Resource Identifier (URI). The voice command platform and the application server may use the URI to identify documents of a voice command application. For instance, a document may indicate that, if a user speaks a particular grammar, the platform should execute a particular document from a designated URI, but that, if the user speaks another grammar, then the platform should execute another document from another designated URI. If the document is not present on the voice command platform, then the voice command platform may signal the application server to send the document to the voice command platform. Responsively, the application server may locate the document at the designed URI and send the document to the voice command platform.
An example of a VXML application is a weather reporting application. The weather reporting application may have a document that includes a tag defining a welcome message and prompting a user to indicate a city or zip code. The document may further set forth a bundle of grammars that are possible city names and corresponding zip codes that a user can speak in response to the prompt.
When the application server sends this document to the voice command platform, the voice command platform may execute the logic defined by the document. The voice command platform may send a welcome message/prompt to the TTS engine so as to have the message/prompt spoken to the user. In turn, when the user speaks a response, the SR engine may identify the response as one of the acceptable grammars.
The document might next prompt the user to indicate whether he would like to hear today's weather or an extended forecast, and the user would again speak a response. In turn, the document might indicate that, if the user selects “today's weather,” the voice command platform should signal the application server to send a document identified by a designated URI. If the user selects “extended forecast,” however, the voice command platform should signal the application server to send a different document identified by another designated URI. The application server may retrieve the document at the designed URI, send another document to the voice command platform, and the voice command platform may execute the logic of the document.