Within organizations or organizations' units that handle interactions with customers, suppliers, employees, colleagues or the like, it is often required to extract information from the interactions in an automated and efficient manner. The organization can be for example a call center, a customer relations center, a trade floor, a law enforcements agency, a homeland security office, or the like. The interactions may be of various types which may particularly include an audio part, such as phone calls using all types of phone systems, recorded audio events, walk-in center events, video conferences, chats, captured web sessions, audio segments downloaded from the internet, audio files or streams, the audio part of video files or streams or the like.
The interactions received or handled by an organization constitute a rich source of customer related information, product-related information, or any other type of information which is significant for the organization. However, searching through the information in an efficient manner is typically a problem. A call center or another organization unit handling interactions receives a large amount of interactions which contain a vocal part, mainly depending on the number of employed agents. Listening, viewing or otherwise searching through a significant percentage of the interactions would require time and manpower of the same order of magnitude that was required for the initial handling of the interaction, which is apparently impractical. Currently used search mechanisms include phonetic indexing and search, and word-based indexing and search. Phonetic indexing requires significant storage space, in the order of magnitude of the audio input and it is less accurate than word-based speech-to-text. However, the search is not limited to words appearing in a dictionary or lexicon according to which indexing was done, and any word can be searched within indexed segments, including words that were not known at the time of the indexing. This can happen, for example with new competitor product names which are mentioned by customers prior to being entered as part of the dictionary according to which audio signals are indexed. In addition, the users, such as business-intelligence departments supply the searched terms in words rather than in phonemes, thus the indexed phonemes are actually inaccessible. Word-based indexed data, generated by a speech-to-text engine using a predetermined dictionary on the other hand, is easier to search but the indexing time requirements and error rate are high, especially on low quality input. In addition, only words that were included in the lexicon according to which the segments were indexed can be searched for.
There is therefore a need in the art for a system and method for enabling indexing and search within audio signals. The method and apparatus should enable fast, efficient, high quality, and accessible results.