In connection with call center operations, there is a demand to grasp operational contents such as the types of clients' inquiries, the contents of questions, and talk time to utilize the determined contents for operational analysis and planning. Here, the talk time means amount of time a call center agent spends with a caller during a transaction. Thus, in many call centers, telephone operators record the contents of each of their responses so that the response records can be analyzed later. However, in small-scale call centers, responses are not recorded or response records are available but contain only a small amount of information. Therefore, dialogue between clients and telephone operators need to be recorded so that speech dialog data can be analyzed.
However, it is a high cost and difficult to listen to all of the speech data from the beginning in order to grasp the response contents from the speech dialogue data. Thus, in order to determine sections enough to understand the content from speech data mainly composed of spoken language such as the dialogue between clients and telephone operators, keywords extracted based on speech recognition are used.
However, in the speech recognition, an unknown keyword may be misrecognized as a known word or fail to be recognized and remain undetected. Thus, a keyword dictionary (keyword list) needs to be maintained and managed. In particular, if speech data which records dialogue between clients and operators in a call center is to be dealt with, any of technical terms or unique words uttered during the responses is specified as a keyword. Thus, effective speech recognition process cannot be achieved with a general keyword dictionary.
A conventional keyword dictionary creating process involves extracting a keyword from manuals and related documents for the contents of operations in the call center, and adding speech data on the keyword to the keyword dictionary. Alternatively, a maintenance operator actually listens to the speech dialogue data from the beginning, extracts a keyword, and adds the keyword to the keyword dictionary.
Furthermore, a processing technique for extracting an unknown word during speech recognition is known. For example, Patent Document 1 discloses a process of preparing a speech recognition grammar that assumes unknown words to appear, extracting speech characteristic information and a phoneme sequence for a section in which an unknown word is assumed to appear, carrying out clustering based on the speech characteristic information, detecting a representative phoneme sequence in the clustered phoneme sequence as an unknown word, and additionally registering the phoneme sequence in a dictionary. Japanese Patent Laid-Open No. 2002-358095, as Patent Document 1, may disclose a technique related in the invention.
Keywords extracted from related documents by the conventional keyword extracting process may not be used as proper keywords because speech dialogue data to be recognized is spoken language.
On the other hand, if keywords are manually extracted from speech dialogue data by actual listening, listening to all of the speech data requires a long time, disadvantageously resulting in very high operation costs.
Furthermore, in the process disclosed in Patent Document 1, the section in which the unknown word is expected to be uttered is pre-specified based on the speech recognition grammar structure. Thus, the process is difficult to apply to speech data which records dialogues that are difficult to stylize.
As described above, no technique for directly extracting an unknown keyword from speech data has been realized.