The present invention relates to the fields of speech synthesis and speech processing.
Pitch modification is an important processing component of expressive Text-To-Speech (TTS) synthesis and voice transformation. The pitch modification task may generally appear either in the context of TTS synthesis or in the context of natural speech processing, e.g. for entertainment applications, voice disguisement applications, etc.
Applications such as affective Human Computer Interface (HCI), emotional conversational agents and entertainment, demand for extreme pitch modification capability which preserves speech naturalness. However, it is widely acknowledged that pitch modification and synthesized speech naturalness are contradictory requirements.
Pitch modification may be performed, for example, over a non-parameterized speech waveform using Pitch-Synchronous Overlap and Add (PSOLA) method or by using a parametric speech representation. Regardless of the method used, significant raising or lowering of the original tone of speech segments may significantly deteriorate the perceived naturalness of the modified speech signal.
The foregoing examples of the related art and limitations related therewith are intended to be illustrative and not exclusive. Other limitations of the related art will become apparent to those of skill in the art upon a reading of the specification and a study of the figures.