In various applications, it is desirable to detect when a specific phrase has been spoken. However, current phrase spotting products can suffer from inaccurate phrase spotting. In addition, such products often do not provide the speaker with means to guide the system and improve its performance. In particular, the individual accents of speakers can adversely affect the accuracy of spotting specific phrases.
In order to improve the accuracy of phrase spotting systems, a training mode can be utilized. During training, a user is asked to provide speech samples in response to prompts. While such training can be effective at increasing the accuracy of speech to text systems for individual users, such training is time consuming. Additionally, when attempting to spot specific phrases spoken by an unknown or random speaker, traditional training as described above may prove to be impractical. In order to provide improved accuracy for a larger group of users such as callers into a contact center, individual users can be assigned to profile categories. For example, a user with an Australian accent can be associated with a profile that is intended to accurately spot phrases spoken with that accent.
In certain contexts, such as in contact centers, it can be desirable to monitor audio signals comprising speech for certain key words or phrases. For example, an enterprise might be interested in monitoring conversations between contact center agents and customers for certain words. As a particular example, a contact center server can monitor calls in real time for the word “supervisor.” If that word is detected, it can be used as a trigger to a supervisor to intervene in the call, or to monitor the ongoing call. As another example, a financial institution may routinely record customer calls, so that an accurate record of customer instructions can be maintained. If a question later arises as to the content of a customer's earlier instructions, it can be desirable to search through the recordings made of earlier conversations between the customer and contact center agents, to locate and play back the earlier instructions. However, in such situations, there is little or no opportunity to train the system to accurately recognize the speech being monitored. Accordingly, previous techniques for training systems have been ineffective in these other contexts.