1. Field of the Invention
The present invention relates to a natural language processing, in particular, relates to a language understanding device which carries out language understanding based on speech recognition results or the like.
2. Description of Related Art
As a method for language understanding based on speech recognition, a non-patent document 1 (Stephanie Seneff, “TINA: A natural language system for spoken language applications.”, Computational Linguistics, Vol. 18, No. 1, pp. 61-86, 1992) discloses a method in which utterances of a user is classified in accordance with keyword spotting or heuristic rules. Moreover, non-patent document 2 (Katsuhito Sudoh and Hajime Tsukada, “Tightly integrated spoken language understanding using word-to-concept translation.”, In Proc. EUROSPEECH, pp. 429-432, 2005) discloses a method in which occurrence probabilities are learned by using a corpus-based method. Furthermore, a non-patent document 3 (Alexandors Potamianos and Hong-Kwang J. Kuo., “Statistical recursive finite state machine parsing for speech understanding.”, In Proc. ICSLP, pp. 510-513, 2000) and a non-patent document 4 (Chai Wutiwiwatchai and Sadaoki Furui, “Hybrid statistical and structural semantic modeling for That multi-stage spoken language understanding.”, In Proc. HLT-NAACL Workshop on Spoken Language Understanding for Conversational Systems and Higher Level Linguistic Information for Speech Processing, pp. 2-9, 2004) disclose a method in which a Weighted Finite State Transducer (WFST) is used.
Language understanding (LU) in spoken dialogue systems needs to be robust against automatic speech recognition (ASR) errors. Moreover, it is preferable that such a language understanding device use a small amount of training data for its construction because it takes a lot of efforts and time to collect data for a new domain, and less data for a new domain make it easier to construct a language understanding device of a new spoken dialogue system. Several methods of implementing a language understanding device in spoken dialogue systems have been proposed. Using a grammar-based speech recognizer is one of the simplest methods. Although the ASR result can be transformed into concepts without difficulty, complicated grammars are required to understand utterances of various expressions, and it costs expensive to the system developer.
Classifying users' utterances using keyword spotting or heuristic rules is another method (non-patent document 1). In this method, utterances can be transformed into concepts without large modification of the rules. However, it also takes a lot of time and efforts to the system developer as in the case of the grammar-based speech recognition methods, because many complicated rules must be prepared manually.
To cope with these problems, a method in which occurrence probabilities are learned by using a corpus-based method (non-patent document 2) and Weighted Finite State Transducer (WFST)-based methods (non-patent documents 3 and 4) have been proposed. These methods, however, require a lot of training data to implement the device, and they are not suitable for constructing a language understanding device for a new domain. In addition, the trained results depend on the domain of the used corpus. Furthermore, because weights are fixed, such methods cannot deal with changes in the state of speech and users.