State-of-the-art conversational codecs represent with a very good quality clean speech signals at bitrates of around 8 kbps and approach transparency at the bitrate of 16 kbps. To sustain this high speech quality at low bitrate a multi-modal coding scheme is generally used. Usually the input signal is split among different categories reflecting its characteristic. The different categories include e.g. voiced speech, unvoiced speech, voiced onsets, etc. The codec then uses different coding modes optimized for these categories.
Speech-model based codecs usually do not render well generic audio signals such as music. Consequently, some deployed speech codecs do not represent music with good quality, especially at low bitrates. When a codec is deployed, it is difficult to modify the encoder due to the fact that the bitstream is standardized and any modifications to the bitstream would break the interoperability of the codec.
Therefore, there is a need for improving music content rendering of speech-model based codecs, for example linear-prediction (LP) based codecs.