1. Field of the Invention
The present invention relates to utilization of voice fonts for speech synthesis applications and, more particularly, to creation and availability of a network-based voice font platform for use by network subscribers.
2. Introduction
Compression of speech data is an important problem in various applications. For example, in wireless communication and voice over IP (VoIP), effective real-time transmission and delivery of voice data over a network may require efficient speech compression. In entertainment applications such as computer games, reducing the bandwidth for transmitting player-to-player voice correspondence may have a direct impact on the quality of the products and the experience of the end-users. One well-known family of speech compression coding schemes is phoneme-based speech compression. Phonemes are the basic sounds of a language that distinguish different words in that base language. To perform phoneme-based coding, phonemes in speech data are extracted so that the speech data can be transformed into a phoneme stream which is represented symbolically as a text string, in which each phoneme in the stream is coded using a distinct symbol.
With a phoneme-based coding scheme, a phonetic dictionary may be used. A phonetic dictionary characterizes the sound of each phoneme in the base language. It may be speaker-dependent or speaker-independent, and can be created via training using recorded spoken words collected with respect to the underlying population (either a particular speaker or a predetermined population). For example, a phonetic dictionary may describe the phonetic properties of different phonemes in terms of expected rate, tonal pitch and volume. When based on American English, there are a set of 40 different phonemes, according to the International Phoneme Association (24 consonants and 16 vowels).
What is known as a “voice font” may be the phoneme patterns for all 40 phonemes stored in the phoneme dictionary. However, for higher quality voice fonts, sub-phoneme units, such as, for example, bi-phones or even smaller units are typically stored as the voice font. Thus, there can be an essentially unlimited number of voice fonts that can be created, by modifying one or more of the phoneme or sub-phoneme patterns in a stored set.
There may arise situations where an individual may desire to select a “voice font” other that his/her natural voice for a speech signal transmission. Some systems exist that store a limited number of different voice fonts in a memory associated with an individual's communication device (e.g., cell phone, computer, etc.). However, as the number of voice fonts increases, the ability to store and/or update a listing of voice fonts has become problematic.