1. Technical Field
The present invention is directed to an apparatus and method for high performance voice transformation. In particular, the present invention is directed to an apparatus and method for transforming an input voice into an output voice different from the input voice while maintaining some voice characteristics between the input voice and the output voice.
2. Description of Related Art
Voice recognition devices are generally known in the art of voice technologies. With voice recognition devices, a user speaks into a microphone and the voice recognition device recognizes words and phrases from the user""s speech. These recognized words and phrases may then be used, for example, to generate textual messages on a computer display.
Voice synthesis is also generally known in the art. With voice synthesis, textual messages are input to a voice synthesis device which then synthesizes the text into a speech output. Voice synthesis devices are limited in the quality of the output speech due to their objective manner of analyzing the textual messages. Thus, the speech that is output by the voice synthesis device typically has a mechanical quality to it and does not accurately reflect human speech patterns.
Moreover, with the increased use of computer games and, in particular, modem or networked video games, the ability to speak with other players during play has been emphasized. The current video game technology is limited to conversing with other players through typed messages or by way of using ones own digitized speech.
With this latter manner of communicating, if a player has a speech impediment or a thick accent, other players may find it difficult to communicate with him/her. Furthermore, players may find it more enjoyable to speak in a voice other than their own, such as a character in the video game which they are playing.
Thus, it would be advantageous to have an apparatus and method that may transform an input voice into a different output voice while maintaining some of the characteristics of the input voice to more closely resemble actual human speech.
The present invention provides a high performance voice transformation apparatus and method. The voice transformation apparatus includes a controller, an input device interface, an input voice characteristic extraction device, a voice recognition device, a voice dictionary interface, and a speech output generator.
The input device interface provides a communication pathway to a voice input device. The voice input from the voice input device is provided to the voice transformation apparatus, which responds with the controller instructing the input voice characteristic extraction device to extract voice characteristics from the voice input.
At the same time as the input voice characteristic extraction is being performed, or before or after the input voice characteristic extraction is performed, the controller instructs the voice recognition device to perform voice recognition functions on the voice input. The voice recognition functions include breaking down the voice input into symbolic representations of the phonemes that make up the voice input, which are then forwarded to the voice dictionary interface.
The voice dictionary interface provides a communication pathway to one or more voice dictionaries. The voice dictionaries consist of an array of symbolic representations for phonemes associated with a target speaker output speech pattern segment. The voice dictionary interface xe2x80x9clooks-upxe2x80x9d target speaker output speech pattern segments based on the symbolic representations of the phonemes from the input voice pattern.
The target speaker output speech pattern segments are forwarded to the speech output generator which generates the output speech signals that are then transformed into output by the output device. The speech output generator generates the output speech signals by using the target speaker output speech pattern segments forwarded by the voice dictionary interface and applying the voice input characteristics extracted from the voice input by the input voice characteristic extraction device.
In the case that some sounds in the voice input may not be recognized, the voice recognition device may forward the unrecognized segment of the voice input to the speech output generator without performing a voice dictionary look-up function. In this way, the voice input segment that is not recognized may be output by the output device rather than performing an erroneous look-up of an output voice pattern segment.
In addition, to provide a more graceful transition between the output voice pattern segments and the voice input segments which could not be recognized, in the output of the output device, the voice input segment that was not recognized may have voice pattern characteristics of the selected voice dictionary speaker applied to it. These voice pattern characteristics of the selected voice dictionary speaker may be obtained from the voice dictionary as a default setting.
Thus, with the present invention, a user may input his/her voice and designate a different output voice from his/her own to be used for outputting transformed speech. Furthermore, the output voice may more closely resemble actual human speech because the characteristics of the user""s input voice pattern are applied to the output voice. Thus, the output voice will use the same voice fluctuations, same pitch, volume, etc. as that of the user.