The present invention provides a novel data structure stored within a computer-readable memory and a method for generating this data structure. The invention provides an important component that may be used to address the above letter-to-pronunciation problems. Specifically, the invention provides a mixed decision tree having a plurality of internal nodes and a plurality of leaf nodes. A typical implementation would employ one of these mixed decision trees for each letter in the alphabet.
The internal nodes are each populated with a yes-no question. The decision tree is mixed in that some of these questions pertain to a given letter and its neighboring letters in a spelled word sequence. Others of these questions pertain to a given phoneme and its neighboring phonemes in a pronunciation or phoneme sequence corresponding to the spelled word. The letters of the spelled word are aligned with the corresponding phonemes in the pronunciation sequence. The leaf nodes are populated with probability data, obtained during training upon a known corpus, that ranks or scores different phonetic transcriptions of the given letter. The probability data can be used, for example, to select the best pronunciation of a spelled name from a list of hypotheses generated by an upstage process. The probability data can also be used to score pronunciations developed by lexicographers to allow questionable transcriptions to be quickly identified and corrected.
According to the invention, these mixed decision trees are generated by providing two sets of yes-no questions, a first set pertaining to letters and their adjacent neighbors, and a second set pertaining to phonemes and their adjacent neighbors. These sets of questions are supplied to a decision tree generator along with a corpus of predetermined word spelling-pronunciation pairs. The generator uses a predefined set of rules, optionally including predefined pruning rules, to grow a decision tree for each letter found in the training corpus. By providing a corpus that covers all letters of the alphabet, the decision tree generator will generate a mixed tree for each letter of the alphabet. Probability data are assigned to the leaf nodes based on the actual letter-phoneme pairs in the training corpus.
The memory containing the mixed tree data structure can be incorporated into a variety of different speech processing products. For example, the mixed tree can be connected to a speech recognition system to allow the end user to add additional words to the recognition dictionary without the need to understand the nuances of building a phonetic transcription. The decision tree can also be used in a speech synthesis system to generate pronunciations for words not found in the current dictionary.
For a more complete understanding of the invention, its objects and advantages, refer to the following specification and to the accompanying drawings.