Many organizations, such as commercial organizations, financial institutions, government agencies or public safety organizations conduct numerous interactions with customers, users, suppliers and the like on a daily basis. Many of these interactions are vocal, or at least comprise a vocal or audio component, for example, voices of participants of a phone call or the audio portion of a video or face-to-face interaction.
Many organizations record some or all of the interactions, whether it is required by law or regulations, for quality assurance or quality management purposes, or for any other reason.
Once the interactions are recorded, the organization may want to extract as much information as possible from the recorded interactions. A common way to extract information from the interactions relates to speech recognition and in particular to searching for particular words uttered by the participants of the interactions. The searched words may be product names, service names, competitor names, competing product names, or the like. The words may be searched on textual transcripts that are generated by applying Large Vocabulary Continuous Speech Recognition (LVCSR) on the vocal interactions. A common metric of the quality performance of an LVCSR system is word error rate (WER). The WER is calculated by comparing between LVCSR transcripts and manual transcripts of a collection of vocal interactions. The WER is defined as the sum of word substitutions, word insertions and word deletions in the LVCSR transcripts, divided by the total number of words in the manual transcripts. The WER of conversational speech transcripts, generated by LVCSR may be as high as 30%-50%. Such high WER significantly lowers the recall of terms (words/phrases) that are searched on the said textual transcripts.
In order to enhance the recall of searched terms over LVCSR transcripts, there is a need in the art for a method and apparatus for expansion of search queries on textual transcripts that are generated by LVCSR.