Not Applicable
Not Applicable
1. Field of the Invention
This invention relates to Automated Speech Recognition (ASR) systems used to provide voice-automated services. More particularly, the inventions concerns an ASR talkoff suppressor and related method for an ASR-based xe2x80x9cprompt-and-collectxe2x80x9d voice transaction system, whereby spurious ASR responses due to high-energy prompt echoes are eliminated.
2. Description of the Prior Art
In voice communication systems, such as the Public Switched Telephone Network (PSTN), ASR technology has been implemented to automate various network and (non-network) functions. So-called prompt-and-collect voice transaction systems, for example, play stored voice messages that prompt callers to make selections regarding service options that are available on the implementing system. For example, a network service provider may use prompt-and-collect interactions to provide automated operator assistance to callers. Similarly, a network subscriber may program prompt-and-collect features into its automated voice transaction processing equipment designed for supporting call center operations, voice-automated purchasing, and other services.
In the typical prompt-and-collect voice transaction scenario, the caller is given the option of making a selection by entering a number via a telephone handset or by providing a voice response. Often the voice response options are limited to a few one- or two-syllable menu command words and/or numbers, but the allowed responses can be longer in some cases. An ASR unit associated with the prompt-and-collect voice transaction system is programmed to detect, interpret and convert the caller""s voice response into a numeric selection output that triggers a particular action.
Problems can arise in the prompt-and-collect environment if line echoing is present. In that case, a loud prompt echo may be detected by the ASR unit""s end point detector and interpreted as incoming speech, such that a spurious ASR response is produced. Some prior art prompt-and-collect systems attempt to solve this problem by utilizing echo cancellation processing. The goal of such processing is to provide sufficient Echo Return Loss Enhancement (ERLE) to preclude the input signal from ever having enough energy to trigger the ASR unit""s end point detector. However, when the prompt echo is sufficiently loud, there can be sufficient energy in the incoming echo-canceled signal to trigger the end point detector and cause a misrecognition. Accordingly, a need exists for an improvement in a prompt-and-collect voice transaction system that prevents prompt echo-induced end point detector misfires.
The foregoing problems are solved and an advance in the art is obtained by a novel ASR talkoff suppressor and related method that greatly improves the operation of a prompt-and-collect voice transaction processing system. In accordance with the inventive subject matter, a prompt signal is sent and an input signal is received. A comparison is made of a characteristic of the prompt signal and a characteristic of the input signal. Following echo cancellation processing, additional processing of the input signal is performed to reduce the likelihood of a spurious ASR response thereto if the result of the prompt signal/input signal comparison satisfies a predetermined criterion.
In preferred embodiments of the invention, the aforementioned signal characteristic of the prompt signal is signal energy and the signal characteristic of the input signal is signal energy following echo cancellation of the input signal. Thus, the comparison step may include sampling the prompt signal and the echo-cancelled input signal and determining the respective energies thereof This can be done using a xe2x80x9cleaky integratorxe2x80x9d that employs an effective window length of a predetermined number of samples of the prompt signal and the echo-cancelled input signal. The comparison step preferably also includes determining whether the difference between the respective energies of the prompt signal and the echo-cancelled input signal is greater than a predetermined threshold.
The processing of the echo-cancelled input signal to reduce the likelihood of a spurious ASR response can include randomizing the echo-cancelled input signal to make it more noise-like. The processing can also include attenuating the echo-cancelled input signal.
A determination can also be made as to whether the input signal corresponds to a person speaking, and if so, a different criterion is used for evaluating the prompt signal/input signal comparison than if the input signal does not correspond to a person speaking. This determination includes determining a peak energy of the prompt signal and a peak energy of the input signal. The peak energy of the prompt signal is determined as the square of the largest sample of a first predetermined number of samples of the prompt signal. The peak energy of the input signal is determined as the square of the largest sample of a second predetermined number of samples of the input signal. The determination also includes calculating whether the difference between the respective peak energies of the prompt signal and the input signal is less than a predetermined threshold.