Human speech contains at least two kinds of information: (1) a message, i.e., the content of what is being said, and (2) information related to the identity of the human speaker. The first kind of information, the message, is generally not dependent on the particular speech signal comprising the human speech. However, a particular speech signal generally does contain characteristics relating to the identity of the speaker. Thus, to alter information relating to the identity of a speaker, it is necessary to alter certain characteristics of a speech signal. Accordingly, speech conversion techniques enable the conversion of a first speech signal exhibiting a first set of identifying characteristics to a second speech signal or a converted first speech signal exhibiting a second set of desired characteristics. Thus, the first speech signal in effect receives a new identity, while its message is preserved. That is, speech conversion transforms how something is said without changing what is said.
In general, the object of using speech conversion technology is to make one person's speech sound like that of another. Approaches for accomplishing speech conversion are described in the numerous technical publications for example: “Voice Conversion through Transformation of Spectral and Intonation Features,” D. Rentzos et al., Acoustics, Speech, and Signal Processing, 2004, Proceedings, Volume 1, 17-21 May 2004, pages: 21-24; “On the Transformation of the Speech Spectrum for Voice Conversion,” G. Baudoin et al., Spoken Language, 1996, Proceedings, Volume: 3, 3-6 Oct. 1996, pages: 1405-1408 vol. 3; “A Segment-Based Approach to Voice Conversion,” M. Abe, Acoustics, Speech, and Signal Processing, 1991 Volume: 2, 14-17 Apr. 1991, pages: 765-768; “Voice Conversion through Vector Quantization,” M. Abe et al., Acoustics, Speech, and Signal Processing, 1988, Volume: 1, 11-14 Apr. 1988, pages: 655-658; and “Speechalator: two-way speech-to-speech translation on a consumer PDA,” A. Waibel et al., Applied Technology, Human computer Interaction, Eurospeech 2003-Geneva, Sep. 1-4, 2003, Technical paper, posted at cmu.edu/˜awb/papers/_speechalator.pdf, pages: 369-372. Each of the foregoing references is hereby incorporated herein by reference in its entirety.
Examples of speech conversions include, but are not limited to, speech-tone translations, gender translations, accent translations, and speech enhancement for persons with impaired speech characteristics. Further, some speech converters are capable of altering the spectral characteristics of a speech signal. Moreover some speech converters are capable of converting an original speech signal to a different language. Those skilled in the art may be aware of yet other examples of speech conversion.
In general, speech converters work by analyzing speech samples of at least one, but usually more, speakers. This analysis requires collecting data relating to the voice characteristics, e.g., gender, speech accent, speech tone, etc., of original and target speakers. Once such data has been collected, a conversion heuristic may be created for converting an original speaker's speech characteristics into those of a target speaker.
Speech conversion techniques are presently used in isolated settings to convert the speech signal of a particular human speaker, i.e., to make a particular person sound like someone else. Thus, present speech converters have not been adapted for use on a large scale, or in systems in which they may be called upon to transform a wide variety of speech signals. Accordingly, although speech conversion techniques and systems are known to be used for making one person's speech sound like that of another person, such techniques and systems have not been used to facilitate public voice communications.
Nonetheless, present systems and networks for voice communications are required to accommodate speakers with widely varying speech characteristics, even where different speakers are speaking the same language. In different regions of the United States, for example, people speak with widely varying accents, some of which may sound quite strong and be quite difficult to understand for a person from another region of the country. Further, in lieu of ever-increasing globalization, it is not uncommon for persons using public voice communications to be speaking in a language that is the person's second or even third language, again producing an accent and other speech characteristics that may make the person difficult to understand. It is also not unusual for persons who do not have a language in common to have the need to conduct a conversation. Further, in certain situations it may be desirable for a speaker, even where the speaker may be perfectly understood, to mask certain voice characteristics. For example, law enforcement personnel may want to alter speech characteristics indicative of a person's gender or age. Similarly, there are situations in which a user's security would be enhanced by the alteration of certain speech characteristics. For example, there may be situations in which it would enhance a woman's safety to convert her speech signal so that her voice sounded male. Further, many speakers with speech impairments are presently unable to communicate effectively, if at all, using public communications networks.
Accordingly, there is a need for a public voice communication network whereby subscribers to the network can selectively choose to have original speech signals converted to a different speech signal. Such a voice communication network would provide at least the benefits of safety, surveillance, amusement, and/or enhanced comprehension.