The field of voice input/output (I/O) systems has undergone considerable change in the last decade. A recent example of this change is disclosed in U.S. Pat. No. 4,979,216, entitled, Text to Speech Synthesis System and Method Using Context Dependent Vowel Allophones. The patent discloses a text-to-speech conversion system which converts specified text strings into corresponding strings of consonant and vowel phonemes. A parameter generator converts the phonemes into formant parameters, and a formant synthesizer uses the formant parameters to generate a synthetic speech waveform.
A library of vowel allophones are stored, each stored vowel allophone being represented by formant parameters for four formants. The vowel allophone library includes a context index for associating each vowel allophone with one or more pairs of phonemes preceding and following the corresponding vowel phoneme in a phoneme string. When synthesizing speech, a vowel allophone generator uses the vowel allophone library to provide formant parameters representative of a specified vowel phoneme.
The vowel allophone generator coacts with the context index to select the proper vowel allophone, as determined by the phonemes preceding and following the specified vowel phoneme. As a result, the synthesized pronunciation of vowel phonemes is improved by using vowel allophone formant parameters which correspond to the context of the vowel phonemes. The formant data for large sets of vowel allophones is efficiently stored using code books of formant parameters selected using vector quantization methods. The formant parameters for each vowel allophone are specified, in part, by indices pointing to formant parameters in the code books.
Another recent example of an advance in this technology is disclosed in U.S. Pat. No. 4,914,702, entitled, Formant Pattern Matching Vocoder. The patent discloses a vocoder for matching an input speech signal with a reference speech signal on the basis of mutual angular data developed through spherical coordinate conversion of a plurality of formant frequencies obtained from the input and reference speech signals.
Yet another example of an advance in speech synthesis is found in U.S. Pat. No. 4,802,223, entitled, Low Data Rate Speech Encoding Employing Syllable Pitch Patterns. The patent discloses a speech encoding technique useful in low data rate speech. Spoken input is analyzed to determine its basic phonological linguistic units and syllables. The pitch track for each syllable is compared with each of a predetermined set of pitch patterns. A pitch pattern forming the best match to the actual pitch track is selected for each syllable. Phonological linguistic unit indicia and pitch pattern indicia are transmitted to a speech synthesis apparatus. This synthesis apparatus matches the pitch pattern indicia to syllable groupings of the phonological linguistic unit indicia. During speech synthesis, sounds are produced corresponding to the phonological linguistic unit indicia with their primary pitch controlled by the pitch pattern indicia of the corresponding syllable. This technique achieves a measure of approximation to the primary pitch of the original spoken input at a low data rate. In the preferred embodiment, each pitch pattern includes an initial pitch slope, which may be zero indicating no change in pitch, a final pitch slope and a turning point between these two slopes.
Still another example of an advance in speech synthesis is found in U.S. Pat. No. 4,689,817, entitled, Device for Generating The Audio Information of a Set of Characters. The patent discloses a device for generating the audio information of a set of characters in which some characters are intoned or pronounced with a different voice character. The device includes means for making a distinction between a capital letter and a small letter presented. For a capital letter character, a speech pattern is formed in which the pitch or the voice character is modified, while maintaining their identity, with respect to a speech pattern for a small letter of the same character. The device also includes means for determining the position of a letter, preferably the last letter, of a word composed of characters presented and for forming a speech pattern for the relevant letter in which the pitch or the voice character is modified while the identity is maintained.
A final example of a recent advance in speech synthesis is disclosed in U.S. Pat. No. 4,896,359, entitled, Speech Synthesis System by Rule Using Phonemes as Synthesis Units. The patent discloses a speech synthesizer that synthesizes speech by actuating a voice source and a filter which processes output of the voice source according to speech parameters in each successive short interval of time according to feature vectors which include formant frequencies, formant bandwidth, speech rate and so on. Each feature vector, or speech parameter is defined by two target points (r/sub 1/, r/sub 2/), and a value at each target point together with a connection curve between target points. A speech rate is defined by a speech rate curve which defines elongation or shortening of the speech rate, by start point (d/sub 1/) of elongation (or shortening), end point (d/sub 2/), and elongation ratio between d/sub 1/and d/sub 2/. The ratios between the relative time of each speech parameter and absolute time are preliminarily calculated according to the speech rate table in each predetermined short interval.
None of the aforementioned patents or any prior art applicant is aware of employs a model in which format analysis and modification are applied to speech synthesis to improve the quality and perception of speech.