This invention is directed in general to a method and an apparatus for coding signals, and more particularly, for coding both speech signals and music signals.
Speech and music are intrinsically represented by very different signals. With respect to the typical spectral features, the spectrum for voiced speech generally has a fine periodic structure associated with pitch harmonics, with the harmonic peaks forming a smooth spectral envelope, while the spectrum for music is typically much more complex, exhibiting multiple pitch fundamentals and harmonics. The spectral envelope may be much more complex as well. Coding technologies for these two signal modes are also very disparate, with speech coding being dominated by model-based approaches such as Code Excited Linear Prediction (CELP) and Sinusoidal Coding, and music coding being dominated by transform coding techniques such as Modified Lapped Transformation (MLT) used together with perceptual noise masking.
There has recently been an increase in the coding of both speech and music signals for applications such as Internet multimedia, TV/radio broadcasting, teleconferencing or wireless media. However, production of a universal codec to efficiently and effectively reproduce both speech and music signals is not easily accomplished, since coders for the two signal types are optimally based on separate techniques. For example, linear prediction-based techniques such as CELP can deliver high quality reproduction for speech signals, but yield unacceptable quality for the reproduction of music signals. On the other hand, the transform coding-based techniques provide good quality reproduction for music signals, but the output degrades significantly for speech signals, especially in low bit-rate coding.
An alternative is to design a multi-mode coder that can accommodate both speech and music signals. Early attempts to provide such coders are for example, the Hybrid ACELP/Transform Coding Excitation coder and the Multi-mode Transform Predictive Coder (MTPC). Unfortunately, these coding algorithms are too complex and/or inefficient for practically coding speech and music signals.
It is desirable to provide a simple and efficient hybrid coding algorithm and architecture for coding both speech and music signals, especially adapted for use in low bit-rate environments.
The invention provides a transform coding method for efficiently coding music signals. The transform coding method is suitable for use in a hybrid codec, whereby a common Linear Predictive (LP) synthesis filter is employed for reproduction of both speech and music signals. The LP synthesis filter input is switched between a speech excitation generator and a transform excitation generator, pursuant to the coding of a speech signal or a music signal, respectively. In a preferred embodiment, the LP synthesis filter comprises an interpolation of the LP coefficients. In the coding of speech signals, a conventional CELP or other LP technique may be used, while in the coding of music signals, an asymmetrical overlap-add transform technique is preferably applied. A potential advantage of the invention is that it enables a smooth output transition at points where the codec has switched between speech coding and music coding.
Additional features and advantages of the invention will be made apparent from the following detailed description of illustrative embodiments that proceeds with reference to the accompanying figures.