Large organizations, such as commercial organizations, financial organizations or public safety organizations conduct numerous audio and textual interactions with customers, users, suppliers or other persons on a daily basis. Some of these interactions are vocal such as telephone conversations, or at least comprise a vocal component, such as an audio part of a video or face-to-face interaction.
Many organizations record some or all of the interactions, whether this is required by law or regulations, for business intelligence, dispute resolution, quality assurance, quality management purposes, or for any other reason.
In many situations, an agent is required to repeat a predetermined script at the appropriate circumstances. For example, upon a user calling a call center and requesting to perform a certain action that involves a fee, the handling agent may be required to repeat the following script: “There is a fee of $50 to process your request, do you authorize this fee?” in order to make sure that the payment is authorized.
As part of agent training or quality control, it may be required to verify that the agent indeed repeated all necessary scripts accurately, or identify interactions in which the agent failed to do so.
Known methods for script compliance determination include manually listening to audio conversations, which is highly labor intensive and impractical for significant call volumes.
In some embodiments, one or a few predetermined words may be detected in an audio signal. However, the detected words may not constitute a full script to be verified, or may be found at dispersed locations, thus not providing a good solution either. Searching for the full script, however, is highly likely to fail due to detection errors, background noises, insignificant errors of the agent, or other problems.