1. Field of the Invention
The present invention relates to a method, apparatus, and medium for recognizing a large-vocabulary speech based on a multi-layer central lexicon, and more particularly, to a speech recognition method, apparatus, and medium in which a phonetic dictionary of a large-vocabulary is layered in a tree structure in which the central phonetic dictionary is selected at each node, candidate vocabularies are selected by symbol matching with a phoneme sequence acquired via a phoneme decoder, and a final recognition result is detected.
2. Description of the Related Art
U.S. Pat. No. 6,178,401 discloses a method for reducing search complexity in a speech recognition system in which each node of a search network is assumed as a virtual single-state model, a probability value of the model is a highest probability value from M number of states of an original acoustic model, N number of candidates are searched as a simplified model, and a final recognition result is acquired via a detailed matching operation. However, the described conventional speech recognition method has a problem in which memory demand is increased in proportion to an increase of a number of vocabularies because a size of the search network is not reduced.
Another conventional speech recognition method is introduced by L. Fissore et al., “Very large vocabulary isolated utterance recognition: a comparison between one pass and two pass strategies” (CSELT, Italy, ICASSP'88 and published by IEEE), in which a segment is performed with respect to a given speech by using six representative phonemes to construct a phonemes lattice, recognition subject vocabularies are also modeled as six representative phonemes, and N number of finally matched vocabularies are selected as candidates of first pass by using the two representative phoneme information. However, the another conventional speech recognition method has problems in which the candidates in the first pass are not precise due to using a coarse model and a number of the candidates which have to be considered is increased in proportion to a number of vocabularies in a detailed matching process.
In a conventional multilevel speech recognition method grouping vocabularies in a tree structure, vocabularies whose pronunciation is similar are grouped in the tree structure, a virtual vocabulary representing each group is estimated, and speech recognition is performed through multiple levels. However, the conventional multilevel speech recognition method has problems in which producing a lexicon representing the each node is complicated and an amount of calculating a matching score of the representative lexicon is large.
Accordingly, a method of smoothly performing vocabulary recognition with respect to large vocabularies in a device with restricted resources is seriously required.