The present invention relates to speech coding systems and methods and, more particularly, to systems and methods for speech coding via speech recognition and synthesis based on pre-enrolled phonetic tokens.
It is known that conventional speech coders generally fall into two classes: transform coders and analysis-by-synthesis coders. With respect to transform coders, a speech signal is transformed using an invertible or pseudo-invertible transform, followed by a lossless and/or a lossy compression procedure. In a analysis-by-synthesis coder, a speech signal is used to build a model, often relying on speech production models or on articulatory models, and the parameters of the models are obtained by minimizing a reconstruction error.
All of these conventional approaches code the speech signal by trying to minimize the perturbation of the waveform for a given compression rate and to hide these distortions by taking advantage of the perceptual limitations of the human auditory system. However, because the minimum of information necessary to reconstruct the original waveform is quite extensive when coding is performed in the above-mentioned conventional methods, such conventional systems are limited in data bandwidth since it is prohibitive, in time and/or cost, to code so much data. Such conventional systems attempt to minimize the information necessary to reconstruct the original speech waveform without examining the content of the message. In the case of a analysis-by-synthesis coder, such a speech coder exploits the property of speech production but it too does not take into account any information about what is being spoken.