1. Field of the Invention
The present invention relates to a method and apparatus for recognizing continuous speech using search space restriction based on phoneme recognition and, more specifically, to a technique for recognizing continuous speech in which a search space is reduced by restricting connection words to be transitioned at a boundary between words based on a phoneme recognition result in order to improve the speed and performance of speech recognition.
2. Discussion of Related Art
In general, a continuous speech recognition system employs a word network in order to restrict a search space, and the word network is principally embodied by a finite state network (FSN), word-pair grammar, or N-gram. A general idea of the word network is to connect words subsequent to a word by fixing the subsequent words according to rules or connecting statistically probable values.
According to the word-pair grammar, only words that can be subsequent to a specific word are connected. The word-pair grammar is a technique of searching words on the principle that, for example, words “want” and “to eat” can be connected in this order, but they cannot be connected in reverse order. However, according to the word-pair grammar, when a user speaks without using previously defined, standard grammar rules, it is impossible to search words.
The N-gram makes use of statistical probability to connect words. Specifically, the probability of a certain word being subsequent to another word is calculated using a bundle of learning data so that a search process is performed on highly probable words. However, practical use of the N-gram necessarily requires a large number of word bundles, and the N-gram is inadequate for dialogic speech recognition.
According to the FSN, all sentences that can be composed are bundled as a network. Although the FSN can speed up a recognition process, when the number of sentence patterns to be recognized increases, the size of a search space expressed by the FSN also increases, so that a search time increases and speech recognition performance deteriorates.