The present invention is related to the field of real-time speech synthesis, used for example as a substitute for voice-generated speech by persons who have lost control over their vocal apparatus due to disease.
Different types of speech synthesis techniques and systems are known, including text-to-speech systems, articulatory synthesizers, and acoustic or “formant” synthesizers. Text-to-speech systems are generally not suited for conversation-like speech, because of the requirement that users type out all words and the limitation of fixed or “canned” words and relative inexpressiveness. The requirement for typing may be especially problematic for users who are paralyzed or otherwise limited in the speed at which they can generate user input to the system. Both articulatory and formant synthesizers generally require adjustment of a high number of parameters for accurate operation, i.e., to achieve desired speech quality, and thus are viewed as high-dimensional systems from a control perspective. If not properly adjusted, these systems may provide relatively low speech quality, for example due to difficulty with rendering consonants.