Technical Field
The present invention relates to a method for retrieving speech from speech data. Particularly, the present invention relates to a method for retrieving a character string designated by a keyword from speech data.
Related Art
In a call monitoring operation at a call center, for example, a specific word or an inappropriate statement (hereinafter, also simply called “keyword”) is checked from a large number of speech calls to improve the quality of the call center or to evaluate communicators (e.g., customer service representatives (“CSR”) or telephone sales representatives (“TSR”)).
In recent years, call monitoring using speech recognition is implemented, and monitoring can be performed for all calls.
The speech recognition, particularly large vocabulary continuous speech recognition (“LVCSR”), is used in various fields, such as to make a transcript at a call center, to automatically create a record in a court, and to create a caption for a video lecture in college.
In Patent Literature 1, language models, divided into language units, and acoustic models, modeling features of speeches, are referenced. Speech recognition of an input speech is performed, a phonemic transcription is output, a collation unit conversion means divides the phonemic transcription into the same units as division units of a text retrieval dictionary divided into units smaller than the language models. A text retrieval means uses the division result to retrieve the text retrieval dictionary (para. [0008]).
In Patent Literature 2, words in the speech recognition vocabulary are converted to word strings by the large vocabulary continuous speech recognition. Phoneme and syllable recognition is used for words not in the speech recognition vocabulary and recognition error words to recognize phoneme strings and syllable strings that are units shorter than words, thereby enabling to provide a speech retrieval apparatus and method based on speech and text input from a large amount of speech data, including unknown words not in the dictionary or including recognition errors (para. [0027]).
In Patent Literature 3, candidate segments are narrowed down in advance based on a sub-word string generated from a keyword. A candidate segment, serving as a retrieval result, is selected by ranking the candidate segments by a simple process of incrementing a count value of the candidate segment, including the sub-word, thereby enabling to perform high-speed retrieval of speech data and accurate retrieval of speech data by generating a candidate segment after correction of erroneous recognition in speech recognition for the sub-word string generated from the keyword (para. [0015]).
In Non-patent Literature 1, individual syllables are used as sub-word units in the continuous conversation recognition. N-gram arrays of syllables are used as retrieval units to solve a problem of out-of-vocabulary (OOV) keywords and recognition error words in conversational words.    Patent Literature 1 JP2008-262279A    Patent Literature 2 JP2011-175046A Patent Literature 3 JP2009-128508A    Non-patent Literature 1 Keisuke Iwami et al, “Out-of-vocabulary term detection by n-gram array with distance from continuous syllable recognition results”, SLT 2010, pages 200-205, Dec. 15, 2010.    Non-patent Literature 2 Hagen Soltau et al, “The IBM Attila Speech Recognition Tool kit”, Spoken Language Technology Workshop (SLT), 2010 IEEE, pages 97-102, Dec. 15, 2010, can be acquired from <URL:http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5700829&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5700829>