1. Field of the Invention
The present invention relates to the field of speech recognition and, more particularly, to a node-based method for generating candidate word strings in speech recognition without expanding the strings.
2. Description of Related Art
To achieve a higher recognition accuracy, the output of a speech recognition module is not only a single recognition result, instead, a plurality of possible results are provided so that a subsequent process may select a best one therefrom in current speech recognition system.
Therefore, a speech recognition module must provide many possible results to the subsequent process. Accordingly, the generation of a plurality of candidate word strings from a speech signal for the subsequent process is a major concern in developing the speech recognition system.
U.S. Pat. No. 5,241,619 discloses a method for searching candidate word strings in which N candidate word strings are maintained during a matching process of speech signals and words. The N candidate word strings are obtained after the matching process. In such a method, the N candidate word strings maintained previously have to be expanded and modified for each time frame. If there are M words in vocabulary, as shown in FIG. 6, there will be M new candidate word strings generated when a candidate word string is expanded. The best N candidate word strings are selected from all expanded candidate word strings for being used as a basis for the expansion in the next time frame. In this manner, a large memory space is required to store expanded candidate word strings, and a sorting must be performed for each time frame to maintain N possible candidate word strings.
Another approach for candidate word strings search is implemented in a two-stage design. In the first stage, a modified Viterbi algorithm is employed to generate a word lattice from input speech signal. In the second stage, a stack search is used to generate the candidate word string by tracing back the word lattice generated in the first stage. The detailed description of such method can be found in U.S. Pat. No. 5,805,772, entitled xe2x80x9cSystems, Methods and Architecture of Manufacture for Performing High Resolution N-best String Hypothesizationxe2x80x9d and xe2x80x9cA Tree-trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognitionxe2x80x9d by F. K. Soong and E. F. Huang, ICASSP""91, pp. 705-708, 1991, which are hereby incorporated by reference into this patent application. As known, this method must continuously perform stack operations, such as push and pop, for expanding word strings in order to obtain possible candidate word strings. This method inevitably spends much time on the expanding of candidate word strings.
Still another method for searching candidate word strings is implemented in a two-stage design similar to above method. In the first stage, 408 Mandarin syllables are used as recognition units for generating syllable lattice. In the second stage, N-best syllables are selected for back-tracing operation with the use of the stack search in order to generate a plurality of candidate word strings. A detailed description of such a method can be found in xe2x80x9cAn Efficient Algorithm for Syllable Hypothesization in Continuous Mandarin Speech Recognitionxe2x80x9d by E. F. Huang and H. C. Wang, IEEE transactions on speech and audio processing, pp. 446-449, 1994, which is incorporated herein by reference.
A further method for searching candidate word strings is also implemented in a two-stage design, in which a word graph algorithm is employed to generate the word graph and a best word string in the first stage. The detailed description can be found in xe2x80x9cA Word Graph Algorithm for Large Vocabulary Continuous Speech Recognitionxe2x80x9d by S. Ortmanns, H. Ney, and X. Aubert, Computer Speech and Language, pp. 43-72, 1997, which is incorporated herein by reference. In the second stage, the searching for candidate word strings is performed on the nodes of the best word string. The output is recorded in a tree structure for saving memory space. A detailed description of this method can be found in U.S. Pat. No. 5,987,409, entitled xe2x80x9cMethod of and Apparatus for Deriving a Plurality of Sequences of Words From a Speech Signalxe2x80x9d, which is incorporated herein by reference.
Basically, the above methods perform searching operations based on the expansion of word strings. Such operation requires a large memory space to store word strings, and spends a lot of time for to expand word strings. Therefore, it is desired to make an improvement on the search of candidate word strings.
The object of the present invention is to provide a method for quickly generating candidate word strings without expanding the word strings. This method comprises the steps of: (A) determining an associated maximum string score for each node; (B) sorting all nodes by their associated maximum string scores to group the nodes with the same string score into the same node set; and, (C) selecting the node sets with relative high string scores generated in step (B), so as to connect the nodes by their starting time frame and ending time frame, thereby generating the candidate word strings.
Other objects, advantages, and novel features of the invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings.