This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
Many speech models rely on a linear prediction (LP)-based approach, in which the vocal tract is modeled using the LP coefficients. The excitation signal, i.e. the LP residual, is then modeled using further techniques. Several conventional techniques are as follows. First, the excitation can be modeled either as periodic pulses (during voiced speech) or as noise (during unvoiced speech). However, the achievable quality is limited because of the hard voiced/unvoiced decision. Second, the excitation can be modeled using an excitation spectrum that is considered to be voiced below a time-variant cut-off frequency and unvoiced above the frequency. This split-band approach can perform satisfactorily on many portions of speech signals, but problems can still arise, especially with the spectra of mixed sounds and noisy speech. Third, a multiband excitation (MBE) model can be used. In this model, the spectrum can comprise several voiced and unvoiced bands (up to the number of harmonics). A separate voiced/unvoiced decision is performed for every band. The performance of the MBE model, although reasonably acceptable in some situations, still possesses limited quality with regard to the hard voiced/unvoiced decisions for the bands. Fourth, in waveform interpolation (WI) speech coding, the excitation is modeled as a slowly evolving waveform (SEW) and a rapidly evolving waveform (REW). The SEW corresponds to the voiced contribution, and the REW represents the unvoiced contribution. Unfortunately, this model suffers from large complexity and from the fact that it is not always possible to obtain perfect separation into a SEW and a REW.
It would therefore be desirable to provide an improved system and method for modeling speech spectra that addresses many of the above-identified issues.