1. Field of the Invention
This invention relates to a converter for generating a sequence of linguistic elements from a signal representing text. In particular this invention relates to a converter for generating a sequence of phonemes from a textual signal. Such a converter is commonly referred to as a grapheme to phoneme converter, a grapheme being a sub sequence of one or more letters, and a phoneme being a particular type of linguistic element which represents the pronunciation of part of a word. A grapheme to phoneme converter may be used in speech synthesis during text analysis, prior to synthesis of speech from the sequence of phonemes. It may also be used in speech recognition in order to generate a sequence of linguistic elements required to create a speech recognition template. Another use for such a converter could be in a process to linguistically analyse text (for example, sentences) to determine the linguistic properties of the text for example, in terms of the number of phonemes, biphones or triphones.
2. Related Art
One technique for converting a sequence of graphemes to a sequence of phonemes is to use a set of letter-to-sound rules. However, unless a language is phonetic such rules will often produce incorrect phoneme sequences (or pronunciations) for some words. An alternative is to use a large lexicon which provides a phonemic transcription for as many as possible words in a language.
For languages such as Celtic languages (for example, Welsh) and other languages which exhibit the phenomenon known as mutation, the initial letter of a word changes depending on the context of the word. If every possible mutation of a word is included in a lexicon the result is an enormous dictionary which requires a large amount of memory, and long search times.
In this invention a linguistic analyser is provided which uses a phonemic look-up table or dictionary with a smaller number of dictionary entries than would be required if a phonemic transcription was provided for each possible word in a language, thus reducing memory and search time required by the analyser.
According to the present invention there is provided an apparatus for receiving an input signal representing a word, each word comprising a sequence of one or more graphemes, and for providing a sequence of one or more symbols, each symbol representing a phonetic element of said word said apparatus comprising
a first store containing a plurality of representations of words and corresponding symbol sequences;
a second store containing a plurality of duples comprising a substitutable grapheme and a corresponding substitute grapheme;
a third store containing a plurality of duples comprising a substitutable grapheme and a corresponding symbol; and
a processor arranged to
receive said input signal;
provide a first signal corresponding to a grapheme in the word and a second signal corresponding to any graphemes other than said grapheme;
access the second store using the first signal to retrieve a corresponding substitute grapheme;
access the third store using the first signal to retrieve a corresponding symbol;
provide a modified signal comprising a signal corresponding to said substitute grapheme and said second signal;
access the first store using the modified signal to retrieve a corresponding sequence of symbols;
provide a modified sequence of symbols comprising the symbol retrieved from the third store and symbols of the retrieved sequence, which symbols correspond to the second signal.
In a preferred embodiment the first signal corresponds to the first grapheme in the word.
This invention also provides a method for analysing a word, each word comprising a sequence of one or more graphemes, and for providing a sequence of one or more symbols, each symbol representing a phonetic element of said word, the method comprising steps of
a) providing a first signal corresponding to a grapheme in the word and a second signal corresponding to any graphemes other than said grapheme;
b) using the first signal to determine a corresponding substitute grapheme;
c) using the first signal to determine a corresponding symbol;
d) providing a modified signal comprising a signal corresponding to said substitute grapheme and said second signal;
e) using the modified signal to determine a corresponding sequence of symbols;
f) providing a modified sequence of symbols comprising the symbol determined at step c) and symbols of the retrieved sequence, which symbols correspond to the second signal.
In a preferred embodiment the first signal corresponds to the first grapheme in the word.
In an improved version, in the event that no sequence of symbols corresponding to the modified signal is determined at step e) the method further comprises steps of
g) providing a suffix signal corresponding to a subsequence of graphemes at the end of the word and a whole stem signal corresponding to the subsequence of graphemes other than those corresponding to the suffix signal;
h) using the whole stem signal to determine a corresponding sequence of symbols;
i) in the event that a sequence of symbols corresponding to the stem signal is not determined at step h), providing an ending signal corresponding to a sequence of graphemes with which a word may end and using a signal comprising the whole stem signal and the ending signal to determine a corresponding sequence of symbols;
i) using the suffix signal to determine a corresponding sequence of symbols; and
j) providing a sequence of symbols comprising the symbol sequence corresponding to the stem signal and the symbol sequence corresponding to the suffix signal.
And another improvement gives a method in which in which in the event that no sequence of symbols corresponding to the stem signal is determined at step h) the method further comprises steps of
k) providing a first stem signal corresponding to a grapheme in the sequence of graphemes corresponding to the stem signal and a second stem signal corresponding to any graphemes other than said grapheme;
l) using the first stem signal to determine a corresponding substitute grapheme;
m) using the first stem signal to determine a corresponding symbol;
n) providing a modified signal comprising a signal corresponding to said substitute grapheme and said second stem signal;
o) using the modified signal to determine a corresponding sequence of symbols;
p) providing a modified sequence of symbols comprising the symbol determined at step m), symbols of the retrieved sequence, which symbols correspond to the second stem signal and symbols corresponding to the suffix symbol.
This invention also provides a speech synthesiser incorporating a linguistic analyser as described above and a speech recogniser incorporating a linguistic analyser as described above.