Voice conversion systems may be used in a wide variety of applications. In general, “voice conversion” refers to techniques for modifying the voice of a first (or source) speaker to sound as though it were the voice of a second (or target) speaker. As such, voice conversion transforms speech signals to change the perceived identity of the speaker while preserving the speech content. Such transformations typically use conversion models trained on speech provided by source and target speakers.
Gaussian Mixture Modeling (GMM), codebook and frequency warping methods are commonly used for voice conversion. For instance, frequency warping is a voice conversion technique that provides high quality converted speech, but has limited ability to provide speaker identity conversion. Conversely, GMM is a technique which offers good speaker identity conversion but may significantly degrade the quality of the converted speech.