A speech recognition processor uses a predefined grammar to detect words in speech. Speech recognition processors are frequently used as a front end to provide voice-enabled command and control applications. A speech recognition processor detects a word match relatively quickly, but the words that can be detected are limited to the words associated with the particular grammar. Speech recognition processors are typically not effective at or designed to detect a particular relevant word or utterance in an unbounded stream of largely irrelevant words or utterances spoken at a conventional speaking rate, as might occur during a conversation between two or more participants. Rather, a speech recognition processor typically requires a speaker to speak quite distinctly and with brief pauses between commands, because the speech recognition processor is attempting to process each separate utterance as a command. In the context of a normal conversation, coupling a speech recognition channel to the conversation would result in the speech recognition processor attempting to determine whether each spoken word matched a word in the predefined grammar, and responding with error indicators or false positive results for each word that did not match. Speech recognition processors require significant memory and processing capabilities, and in commercial settings are frequently implemented in stand-alone devices that can handle speech processing requirements for a finite number of voice sessions concurrently.
Many speech-enabled applications are front-ends to customer service systems that offer full-time availability, and thus use dedicated speech recognition processors. For example, many businesses now require a caller seeking customer service to interact with a voice-enabled application that will ask for information such as an account number and zip code, and in response provide information about the caller's account, before allowing the caller to speak to a human. Telecommunications providers are beginning to develop ‘on-demand’ voice-enabled applications that can be initiated by a participant in a conversation on an impromptu basis. For example, a telecommunications provider may desire to provide a subscriber the ability, at any time, to request that a third-party be invited to be joined to an existing call via a speech-enabled application. Currently, such an on-demand voice-enabled application would require that a speech recognition processor channel be dedicated to a voice session for the entire voice session. Because speech recognition processors are memory and processor intensive, it may be impractical or cost-prohibitive to simultaneously provide hundreds or even thousands of speech recognition processing channels during hundreds or thousands of voice sessions. Moreover, because speech recognition processors are not designed to select a particular relevant word out of a stream of mainly irrelevant words occurring at a conventional speaking rate, the speech recognition processor would attempt to match each irrelevant word during the conversation to a predefined grammar of commands. Since the majority of words spoken in the conversation would not match any commands in the predefined grammar, the speech recognition processor would repeatedly respond with error indicators or false positive results for each word that did not match. Thus, currently there are several problems with using speech recognition processors in on-demand voice-enabled applications.
Speech analytic processors, in contrast to speech recognition processors, are used to search for utterances, such as words or other sounds, in large quantities of recorded voice data. Speech analytic processors are not typically employed in real-time applications. A speech analytic processor typically receives a recorded voice session as input and encodes the recorded voice session into a searchable file, or database. The speech analytic processor, or associated query module, can then search for and detect designated sounds that may appear in the database. Speech analytics processors do not utilize a grammar and are capable of searching for any designated word, phrase, or sound once the database is generated and, in response to a search request, provide a time offset within the recorded voice session where such word, phrase, or sound was spoken. Speech analytics processors are extremely fast and because they are not designed to be used in a real-time environment or recognize complicated grammars, and they use significantly less processor and memory resources than a speech recognition processor.
To minimize costs associated with speech recognition processing, it would be beneficial if a speech recognition processor could be selectively allocated to a voice session after a determination has been made that a participant in the voice session desires a voice-enabled application, rather than dedicate a speech recognition processor to the voice session that may not be used during the voice session. If a speech recognition processor could be allocated on an impromptu basis, a relatively small pool of speech recognition processors could be used to support a relatively large number of voice sessions. It would be further beneficial if a speech analytics processor could be used to determine when a participant in a voice session desires an on-demand speech-enabled application, because speech analytics processors require significantly less resources than speech recognition processors.