Automated speech recognition (ASR) systems are used for detecting particular words or phrases contained in a voice or audio stream. In customer quality assurance applications, for example, a speech recognition engine may be used in monitoring phone calls between customers and customer service agents to evaluate the quality of customer interactions, and to ensure an adequate level of service is provided. In some applications, the speech recognition engine may also be used to assess in real-time the customer service agent's performance during a phone call. In some situations, the speech recognition engine may also be used to analyze recordings of prior communications to permit a quality compliance manager or supervisor to later assess the quality of the phone call, or to verify or confirm a transaction made during the call. In the financial services industry, for example, the speech recognition engine may be used by broker-dealers to extract information regarding trade confirmations to ensure compliance with the broker-dealer's trading and reporting obligations. Automatic speech recognition systems are also used in a variety of other applications for analyzing speech content.
Software applications that utilize speech recognition engines to detect words or phrases in audio files must often employ carefully tuned search terms to ensure that the output from the engine is accurate and useful. Poorly chosen words, phrases, or other search terms may result in the speech recognition engine not detecting a particular search term within the audio file (i.e., a false negative), or may result in the detection of terms that do not exist in the audio file (i.e., a false positive). Relatively long words such as “imperfection,” “constraining,” and “international” are more likely to be accurately detected by speech recognition engines than relatively short search terms such as “and,” “if,” and “me.” Multiple word phrases or words containing particular sounds or combination of sounds are also more likely to be accurately detected by speech recognition engines. This is often related to the ease by which the speech recognition engine can correctly identify particular phonemes or groups of phonemes within the audio file. The overall efficacy of the system in accurately detecting particular words or phrases is thus dependent on the phonemic characteristics of the search terms.
The process of training and tuning automated speech recognition engines to accurately detect a list of words or phrases in an audio file is typically accomplished by testing the list of search terms against a recorded audio file, assessing the accuracy of the results or hits detected by the speech recognition engine, making changes to the search terms, and then rerunning the test using the new search terms. This process is often repeated multiple times until the results from the speech recognition engine are deemed to be sufficiently accurate and robust for the application. Such an iterative process of tuning speech recognition systems is often a manual, time intensive process, typically performed by professionals with knowledge of linguistics and speech recognition technology. In some applications, the process of tuning the speech recognition engine to accurately detect search terms may take months or even years to complete, and must be redone as new search terms are added to the system.