In the context of voice conversion applications, such as voice services, man-machine oral dialog applications or the voice synthesis of texts, the auditory reproduction is essential and, to achieve acceptable quality, it is necessary to have a firm control over the parameters related to the prosody of the voice signals.
Conventionally, the main acoustic or prosodic parameters modified during voice conversion methods are the parameters relating to the spectral envelope and/or, for voiced sounds putting into action the vibration of the vocal cords, the parameters relating to a periodic structure, i.e. the fundamental period, the inverse of which is called the fundamental frequency or pitch.
Conventional voice conversion methods comprise in general the determination of at least one function for transforming acoustic features of the source speaker into acoustic features similar to those of the target speaker, and the transformation of a voice signal to be converted by the application of this or these functions.
This transformation is an operation that is long and costly in terms of computation time.
Indeed, such transformation functions are conventionally considered as linear combinations of a large finite number of transformation elements applied to elements representing the voice signal to be converted.