The invention relates to a method, a computer program product, a data medium and a computer system for the assignment of phonemes to the graphemes producing them in a lexicon having words (grapheme sequences) and their associated phonetic transcription (phoneme sequences).
Speech processing methods are disclosed, for example, in U.S. Pat. No. 6,029,135, U.S. Pat. No. 5,732,388, DE 19636739 C1 and DE 19719381 C1. Routines for grapheme-phoneme conversion, that is to say for converting written words into spoken sounds, are required for automatically reading aloud or extending the vocabulary of dictation systems or of automatic speech recognition systems. Neural networks are frequently used for this purpose.
The training of these neural networks is performed with the aid of patterns. A pattern includes of a number of letters from a word which are applied to the input nodes of a neural network, and of the associated phoneme corresponding to the output node. Each phoneme is frequently also assigned what is termed a grouping value. The grouping value specifies the number of graphemes which produce the associated phoneme.
The patterns are obtained from what are termed training lexica. A training lexicon contains assignments of graphemes, as a rule words, numerals, etc., that is to say everything which is to be converted, to phonemes and phoneme sequences, that is to say grapheme-phoneme transcriptions at the level of words. The phoneme sequences are produced in the training lexicon by a suitable type of phonetic transcription. SAMPA phonetic transcriptions or Spicos inventory, which are based on ASCII characters, are frequently used in the field of automatic speech recognition. A few German words may be listed by way of example with the associated phonetic transcription in SAMPA:
Quatschkv'atSspätSpE:tSchutzSUtsschwerSve:6SpracheSpra:x@
The sound “sch” is represented, for example, by [S], lengthenings by a colon. In this case, phonemes are represented in square brackets [ ], graphemes in pointed brackets < >. All the examples of phonetic transcription in the description are reproduced in SAMPA.
Although these training lexica include the phonetic transcription, they do not include the unique assignment of phonemes and the graphemes producing them, as required for the patterns. For example, the following assignment would be desirable for the word <Sprache>:
GraphemesSprachePhonemesS, 1p, 1r, 1a:, 1x, 2@, 1from which it is easier to derive the patterns for training the neural network. In the case of an input window with 7 letters, the following 6 patterns are yielded directly from the unique assignment:
1stInputSpraPatternOutputS, 1The grapheme sequence of 3 empty characters, <S>, <p>, <r> and <a>, <S> being located centrally in the input window, is assigned to the sound [S] with the grouping value 1. The following are obtained correspondingly as further patterns:
2ndInputSpracPatternOutputp, 13rdInputSprachPatternOutputr, 14thInputSprachePatternOutputa:, 15thInputprachePatternOutputx, 2The “Ach” sound, or voiceless velar fricative “ch” is assigned a grouping value of 2 in accordance with the segmentation rules, since it is assigned the two letters <c> and <h>. The letter window can therefore be displaced in the following pattern by 2 letters:
6thInputachePatternOutput@, 1
The assignment of letters to phonemes is not, however, yielded uniquely from the phonetic transcription of the lexicon. The word <Sprache> has of 7 letters, but only of 6 phonemes. The question arises as to which of the phonemes is produced by 2 letters. Since also 2 phonemes can be produced by one letter, for example [ks] by <x>, the uncertainty in the grapheme-phoneme assignment is a general problem for the patterns.
To date, the grapheme-phoneme assignment has been carried out semi-automatically, starting from empirical rules evident to a native speaker, but this is subject to error, particularly in the case of multilingual systems, and constitutes a substantial outlay.