This patent document contains information subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent, as it appears in the U.S. Patent and Trademark Office files or records but otherwise reserves all copyright rights whatsoever.
Aspects of the present invention relate to data compression in general. Other aspects of the present invention relate to speech compression.
Compression of speech data is an important problem in various applications. For example, in wireless communication and voice over IP (VoIP), effective real-time transmission and delivery of voice data over a network may require efficient speech compression. In entertainment applications such as computer games, reducing the bandwidth for transmitting player to player voice correspondence may have a direct impact on products"" quality and end users"" experience.
Different speech compression schemes have been developed for various applications. For example, a family of speech compression methods are based on linear predictive coding (LPC). LPC utilizes the coefficients of a set of linear filters to code speech data. Another family of speech compression methods is phoneme based. Phonemes are the basic sounds of a language that distinguish different words in that language. To perform phoneme based coding, phonemes in speech data are extracted so that the speech data can be transformed into a phoneme stream which is represented symbolically as a text string, in which each phoneme in the stream is coded using a distinct symbol.
With a phoneme based coding scheme, a phonetic dictionary may be used. A phonetic dictionary characterizes the sound of each phoneme in a language. It may be speaker dependent or speaker independent and can be created via training using recorded spoken words collected with respect to the underlying population (either a particular speaker or a pre-determined population). For example, a phonetic dictionary may describe the phonetic properties of different phonemes in terms of expected rate, tonal, pitch, and volume qualities.
To recover speech from a phoneme stream, the waveform of the speech may be reconstructed by concatenating the waveforms of individual phonemes. The waveforms of individual phonemes are determined according to a phonetic dictionary. When a speaker dependent phonetic dictionary is employed, a speaker identification may also be transmitted with the compressed phoneme stream to facilitate the reconstruction.
With phoneme based approaches, if the acoustic properties of a speech deviate from the phonetic dictionary, the reconstruction may not yield a speech that is reasonably close to the original speech. For example, if a speaker dependent phonetic dictionary is created using a speaker""s voice in normal conditions, when the speaker has a cold or speaks with a raised voice (corresponding to higher pitch), the distinct acoustic properties associated with the spoken words under an abnormal condition may not be truthfully recovered. When a speaker independent phonetic dictionary is used, the individual differences among different speakers may not be recovered. This is due to the fact that existing phoneme based speech coding methods do not encode the deviations of a speech from the typical speech pattern described by a phonetic dictionary.