Speech generation is a process that allows the transformation of a string of symbols into a synthetic speech signal. An input text string is divided into graphemes (e.g. letters, words or other units) and for each grapheme a corresponding phoneme is determined. In linguistic terms a “grapheme” is the visual form of a character string, while a “phoneme” is the corresponding phonetic pronunciation.
The task of grapheme-to-phoneme alignment is intrinsically related to text-to-speech conversion and provides the basic toolset of grapheme-phoneme correspondences for use in predicting the pronunciation of a given word. In a speech synthesis system, the grapheme-to-phoneme conversion of the words to be spoken is of decisive importance. In particular, if the grapheme-to-phoneme transcription rules are automatically obtained from a large transcribed lexicon, the lexicon alignment is the most important and critical step of the whole training scheme of an automatic rule-set generator algorithm, as it builds up the data on which the algorithm extracts the transcription rules.
The core of the process is based on a dynamic programming algorithm. The dynamic programming algorithm aligns two strings finding the best alignment with respect to a distance metric between the two strings.
A lexicon alignment process iterates the application of the dynamic programming algorithm on the grapheme and phoneme sequences, where the distance metric is given by the probability P(f|g) that a grapheme g will be transcribed as a phoneme f. The probabilities P(f|g) are estimated during training each iteration step.
In document Baldwin Timoty and Tanaka Hozumi, “A comparative Study of Unsupervised Grapheme-Phoneme Alignment Methods”, Dept of Computer Science-Tokyo Institute of Technology, two well-known unsupervised algorithms to automatically align grapheme and phoneme strings are compared. A first algorithm is inspired by the TF-IDF model, including enhancements to handle phonological determine frequency through analysis variation and of “alignment potential”. A second algorithm relies on the C4.5 classification system, and makes multiple passes over the alignment data until consistency of output is achieved.
In document Walter Daelemans and Antal Van den Bosch, “Data-oriented Methods for Grapheme-to-Phoneme Conversion”, Institute for Language Technology and AI, Tilburg University, NL-5000 LE Tilburg, two further grapheme-to-phoneme conversion methods are shown. In both cases the alignment step and the rule generation step are blended using a lookup table. The algorithms search for all unambiguous one-to-one grapheme-phoneme mappings and stores these mappings in the lookup table.
In U.S. Pat. No. 6,347,295 a computer method and apparatus for grapheme-to-phoneme rule-set-generation is proposed. The alignment and rule-set generation phases compare the character string entries in the dictionary, determining a longest common subsequence of characters having a same respective location within the other character string entries.
In the methods disclosed in the above-mentioned documents, the graphemes and the phonemes belong respectively to a grapheme-set and a phoneme-set that are defined in advance and fixed, and that cannot be modified during the alignment process.
The assignment of graphemes to phonemes is not, however, yielded uniquely from the phonetic transcription of the lexicon. A word having N letters may have a corresponding number of phonemes different from N, since a single phoneme can be produced by two or more letters, as well as one letter can, produce two or more phonemes. Therefore, the uncertainty in the grapheme-phoneme assignment is a general problem, particularly when such assignment is performed by an automatic system.
The Applicant has tackled the problem of improving the grapheme-to-phoneme alignment quality, particularly where there are a different number of symbols in the two corresponding representation forms, graphemic and phonetic. In such cases a coherent grapheme-phoneme association is particularly important, in presence of automatic learning algorithms, to allow the system to correctly detect the statistic relevance of each association.
The Applicant observes that particular grapheme-phoneme associations, in which for example a single letter produces two phonemes, or vice versa, may recur very often during the alignment process of a lexicon.
The Applicant has determined that, if such particular grapheme-phoneme associations are identified during the alignment process and treated accordingly in a coherent and well defined manner, such alignment can be particularly precise.
In view of the above, it is an object of the invention to provide a method of generating grapheme-phoneme rules comprising a particularly accurate alignment phase, which is language independent and is not bound by the lexical structures of a language.