1. Technical Field
The present invention relates generally to speech coding; and, more particularly, it relates to low bit rate speech coding systems that employ pitch enhancement to improve the perceptual quality of reproduced speech.
2. Description of Related Art
Conventional speech coding systems typically employ only forward pitch enhancement in code-excited linear prediction speech coding systems. This is largely due to the fact that the sub-frame size of conventional speech codecs, having relatively large bandwidth availability, can provide sufficient perceptual quality with forward pitch enhancement alone. However, for lower bit rates within various communication media employed in speech coding systems, the perceptual quality of reproduced speech, after synthesis, fails to maintain a high perceptual quality.
For conventional speech coding systems that operate at these decreased bit rates, the pitch lag, that is generated during pitch prediction, is commonly much shorter than the overall subframe size, i.e., it covers a relatively small portion of the overall sub-frame. This characteristic is more accentuated for those speakers having a higher (shorter) pitch, such as females and children. Traditional excitation codebook structures do not afford a sufficient high perceptual quality when operating at low bit rates. This is primarily because the periodicity of the voiced signal is not sufficiently established, or the excitation vector extracted from the codebook is insufficiently rich to generate a synthesized speech signal having a high perceptual quality.
As the sub-frame size of speech coding systems becomes larger, as is commonly associated with communication systems that have decreasing bit rates, the fact that pitch enhancement is performed in only the forward direction results in significantly poorer perceptual quality. This is due, among other reasons, to the fact that there is a significant amount of dead space in the sub-frame due to the absence of many pulses. In conventional speech coding systems that operate at higher bit rate, having consequently shorter sub-frames, this effect is not typically audibly perceived by the human ear. This effect of lower perceptual quality is realized in nearly all speech coding systems that deal with speech coding having relatively low available bit rates.
Further limitations and disadvantages of conventional and traditional systems will become apparent to one of skill in the art through comparison of such systems with the present invention as set forth in the remainder of the present application with reference to the drawings.
Various aspects of the present invention can be found in a speech coding system that employs forward pitch enhancement and backward pitch enhancement. In certain embodiments of the invention, the forward pitch enhancement and the backward pitch enhancement are performed in a single portion of the entire speech coding system. For example, in speech coding systems having a speech codec, wherein the speech codec contains an encoder and a decoder, the forward pitch enhancement and the backward pitch enhancement are performed in both the encoder and the decoder of the speech codec. Alternatively, in other embodiments of the invention, the forward pitch enhancement and the backward pitch enhancement are performed only in the decoder of the speech codec. As determined by the specific application, the forward pitch enhancement and the backward pitch enhancement are performed in a distributed manner, each being performed, at least in part, in each one of the encoder and the decoder of the speech codec.
In certain embodiments of the invention, the backward pitch enhancement is generated using the forward pitch enhancement itself. The backward pitch enhancement is a mirror image of the forward pitch enhancement that is previously generated; the backward pitch enhancement is generated dependent on the forward pitch enhancement. Alternatively, in other embodiments of the invention, the backward pitch enhancement is generated independent of the forward pitch enhancement; the backward pitch enhancement is generated irrespective of the forward pitch enhancement that has previously been generated.
The speech coding system, built in accordance with the present invention, is appropriately geared toward those speech coding systems that operate using communication media having limited or constrained bandwidth availability. Any communication media may be employed within in the invention, without departing from the scope and spirit thereof. Examples of such communication media include, but are not limited to, wireless communication media, wire-based telephonic communication media, fiber-optic communication media, and ethernet.