Numerous advances, both with respect to hardware and software, have been made in recent years relating to computer-based speech recognition and to the conversion of text into electronically generated synthetic speech. Thus, there now exist computer-based systems in which data that is to be synthesized is stored as text in a binary format so that as needed the text can be electronically converted into speech in accordance with a text-to-speech conversion protocol. One advantage of this is that it reduces the memory overhead that would otherwise be needed to store “digitized” speech.
Notwithstanding these advances, however, one problem persists in transforming textual input into intelligible human speech, namely, the handling of homographs that are sometimes encountered in any textual input. A homograph comprises one or more words that have identical spellings but different meanings and different pronunciations. For example, the word BASS has two different meanings—one pertaining to a type of fish and the other to a type of musical instrument. The word also has two distinct pronunciations. Such a word obviously presents a problem for any text-to-speech engine that must predict the phonemes that correspond to the character string B-A-S-S.
In some instances, the meaning and pronunciation may be dictated by the function that the homograph performs; that is, the part of speech to which the word corresponds. For example, the homograph CONTRACT, when it functions as a verb has one meaning—and, accordingly, one pronunciation—and another meaning and corresponding pronunciation when it functions as a noun. Therefore, since nouns frequently precede predicates, knowing the order of appearance of the homograph in a word string may give a clue as to its appropriate pronunciation. In other instances, however, homographs function as the same parts of speech, and accordingly, word order may not be helpful in determining a correct pronunciation. The word BASS is one such homograph: whether as a fish or a musical instrument, it functions as a noun.
In contexts other than word recognition, one method of pattern classification that has been successfully utilized is recursive partitioning. Recursive partitioning is a method that, using a plurality of training samples, tests parameter values to determine a parameter and value that best separate data into categories. The testing uses an objective function to measure a degree of separation effected by partitioning the training sample into different categories. Once an initial partitioning test has been found, the algorithm is recursively applied on each of the two subsets generated by the partitioning. The partitioning continues until either a subset comprising one unadulterated, or pure, category is obtained or a stopping criterion is satisfied. On the basis of this recursive partitioning and iterative testing, a decision tree results which specifies tests and sub-tests that can jointly categorize different data elements.
Although recursive partitioning has been widely applied in other contexts, the technique is not immediately applicable to the disambiguation of homographs owing to the large amounts of missing data that typically occur. Thus, there remains in the art a need for an effective and efficient technique for implementing a recursive partitioning in the context of disambiguating homographs during a text-to-speech conversion. Specifically, there is a need for a technique to recursively partition a training set to construct a statistical test, in the form of a decision tree, that can determine with a satisfactory level of accuracy the pronunciations of homographs that may occur during a text-to-speech event.