1. Field of the Invention
The present invention relates to speech recognition, and more particularly, to an apparatus and method for adaptively and automatically generating a grammar network for use in speech recognition based on contents of previous dialogue, and an apparatus and method for recognizing dialogue speech by using the grammar network for speech recognition.
2. Description of the Related Art
Among grammar generation algorithms used in a decoder among elements of a speech recognition apparatus such as a virtual machine and a computer, well-known methods, such as an n-gram method, a hidden Markov model (HMM) method, a speech application programming interface (SAPI), a voice eXtensible markup language (VXML), and a speech application language tags (SALT) method, are used. In the n-gram method, real-time discourse information between a speech recognition apparatus and a user is not reflected in utterance prediction. In the HMM method, each moment of utterance by a user is assumed as an individual probability event completely independent from other utterance moments of the user or a speech recognition apparatus. Meanwhile, in the SAPI, VXML, and SALT methods, a predefined grammar in a simple prefixed discourse is loaded on predefined time points.
As a result, when the content of utterance by a user falls outside of a predefined standard grammar structure, it becomes difficult for the speech recognition apparatus to recognize the utterance of the user, and therefore the speech recognition apparatus prompts the user to utter again. In conclusion, the time taken by the speech recognition apparatus to recognize the utterance of the user becomes longer such that the dialogue between the speech recognition apparatus and the user becomes unnatural as well as tedious.
Furthermore, a grammar network generation method of the n-gram method using a statistical model may be appropriate to a grammar network generator of a speech recognition apparatus for dictation utterance, but it is not appropriate to that for a speech recognition apparatus for conversational utterance due to a drawback that real-time discourse information is not utilized for utterance prediction. In addition, grammar network generation methods of the SAPI, VXML and SALT methods that employ a context free grammar (CFG) using a computational language model may be appropriate to a grammar network generator of a speech recognition apparatus for command and control utterance, but these are not appropriate for conversational utterance due to a drawback that the discourse and speech content of the user cannot go beyond a pre-designed fixed discourse.