1. Field of the Invention
The present invention relates to audio content analysis in general, and to a method and apparatus for retrieving business insight from auditory information in particular.
2. Discussion of the Related Art
Within organizations or organizations' units that handle auditory data including interactions, such as call centers, customer relations centers, trade floors, law enforcements agencies, homeland security offices or the like, it is often required to extract information from the audio segments in an automated and efficient manner. The audio segments may be of various types, including phone calls using all types of phone systems, transmitted radio, recorded audio events, walk-in center events, video conferences, e-mails, chats, instant messaging, access through a web site, radio or TV broadcast, audio segments downloaded from the internet, audio files or streams, the audio part of video files or streams or the like. The information to be extracted from the segments may relate to various aspects, such as content of the segments, categories to which the segments may be classified, entities participating, subject, products, interaction type, up-sale opportunities, detecting high-risk calls, detecting legal threats, customer churn analysis, customer satisfaction, first call resolution, or others. Having structured information related to segments may be important for analyzing issues such as trend analysis, frequently raised subjects, hidden link analysis between segments, what are the main contributions to call volume, pattern detection, how can the volume be reduced and others. The analysis can also be used for taking business actions, such as locating missed opportunities, locating dissatisfied customers, more accurate resource allocation, such as allocating more agents to handle calls related to one or more subjects of business process optimization, cost reduction, improving quality/service/product, agent tutoring, preventing customer churn, or for other purposes, for example purposes related to security such as relating segments, relating speakers, or the like.
Raw material for audio analysis tools includes the text of the segments to be analyzed, such as interactions, broadcasts or the like as well as additional information, such as indication of emotional parts within the interaction, call flow information, CTI data, or others. The text in its entirety, subject to quality limitations, can be received through the usage of a speech-to-text engine, and sporadic words can be extracted by using word-spotting engines.
However, speech to text engines, which receive as input audio capturing and produce the full text of the captured audio, generally consume significant time and computing resources, thus enabling transcription of only a fragment of the collected interactions. If a larger part of the interactions is to be transcribed, then significant computing power is required. On the other hand, word spotting engines or phonetic search engines, which spot singular words, word parts or syllables in audio interactions, are faster but are generally efficient only for a limited word list of tens-to-thousands of words, or a set of predefined syllables or word parts. Thus, analysis tools which require full text of a large corpus of interactions can not be used with transcription engines, phonetic search, or word spotting engines.
There is therefore a need for an automated system and method that will enable the usage of analysis tools for analyzing audio segments in general, and text analysis tools in particular, while being efficient enough to enable analysis of significant amount of audio interactions.