As is well known, connected word speech recognition technology or continuous speech recognition technology may require a language model, as its component, to find the connection between words forming a sentence. Such language models used may be basically classified into two types: statistical language models based on large-sized corpora such as N-gram models and grammar-based language models represented by CFG.
The statistical language model requires a large-sized corpus, so that it is used for the cases where large-sized corpora can be obtained, such as dictation systems, broadcast news recognition, and lecture and public speaking recognition. Although this model has the advantage of being capable of recognizing relatively various sentences, it always has a possibility of erroneous word connection because a statistical modeling methodology itself is configured to represent the connection between words using probability and it is impossible to accurately ascertain the probability.
Therefore, the grammar-based language model is adopted in those fields which require high accuracy and in which the patterns of human utterances are relatively simple. In detail, the grammar-based language model is mainly adopted for interactive speech interface systems such as robots, home networks and interactive TV guides, or automatic interpretation systems used in specific fields such as the military and tourism. Meanwhile, in the case of the grammar-based language model, the grammar is prepared by an expert or automatically acquired through a corpus. In both cases, as the number of words to be recognized increases and sentences to be recognized become more complicated, sentences with non-viable meanings inevitably occur. For example, an example of domain-specific CFG-type speech recognition grammar written to recognize a sentence such as “eat an apple” may be described in Extended Backus-Naur Form (EBNF), as shown in the following Table 1:
TABLE 1<eat> ::= eat | would like to eat | will eat;<article> ::= an | a | the;<fruits> ::= apple | pear | grapes | banana;<sentence> ::= <eat>[<article>]<fruits>;
The above-described conventional sentence has no problem in the above situation. However, if a rule such as “<pare>::=please pare|would you pare” is added to the existing grammar to additionally recognize a sentence such as “please pare an apple,” the sentence “please pare an apple” is normally recognized. However sentences, such as “please pare grapes” or “please pare a banana,” which cannot be created in the light of meaning are allowed, and thus the possibility of erroneous recognition increases. Accordingly, a grammar system allowing sentences which are not uttered by humans can be created. Further, a search area used in a speech recognition process is unnecessarily increased, so that there are disadvantages as concerns memory and speed.