In 1928 Mickey Mouse was introduced to the public in the first "talking" animation film entitled, "Steamboat Willy". Walt Disney, who created Mickey Mouse, was also the voice of Mickey Mouse. Consequently, when Walt Disney died in 1966 the world lost a creative genius and Mickey Mouse lost his voice.
It is not unusual to discover during the editing of a dramatic production that one or more scenes are artistically flawed. Minor background problems can sometimes be corrected by altering the scene images. However, if the problem lies with the performance itself or there is a major visual problem, a scene must be done over. Not only is this expensive, but occasionally an actor in the scene will no longer be available to redo the scene. The editor must then either accept the artistically flawed scene or make major changes in the production to circumvent the flawed scene.
A double could typically be used to visually replace a missing actor in a scene that is being redone. However, it is extremely difficult to convincingly imitate the voice of a missing actor.
A need thus exists for a high quality voice transformation system that can convincingly transform the voice of any given source speaker to the voice of a target speaker. In addition to its use for motion picture and television productions, a voice transformation system would have great entertainment value. People of all ages could take great delight in having their voices transformed to those of characters such as Mickey Mouse or Donald Duck or even to the voice of their favorite actress or actor. Alternatively, an actor dressed in the costume of a character and imitating a character could be even more entertaining if he or she could speak the voice of the character.
A great deal of research has been conducted in the field of voice transformation and related fields. Much of the research has been directed to transformation of source voices to a standardized target voice that can be more easily recognized by computerized voice recognition systems.
A more general speech transformation system is suggested by an article by Masanobu Abe, Satoshi Nakamura, Kiyohiro Shikano and Hisao Kuwabara, "Voice Conversion Through Vector Quantization," IEEE International Conference on Acoustics, Speech and Signal Processing, (April 1988), pp. 655-658. While the disclosed method produced a voice transformation, the transformed target voice was less than ideal. It contained a considerable amount of distortion and was recognizable as the target voice less than 2/3 of the time in an experimental evaluation.