A state-of-the-art conversational codec can represent with a very good quality a clean speech signal with a bit rate of around 8 kbps and approach transparency at a bit rate of 16 kbps. However, at bitrates below 16 kbps, low processing delay conversational codecs, most often coding the input speech signal in time-domain, are not suitable for generic audio signals, like music and reverberant speech. To overcome this drawback, switched codecs have been introduced, basically using the time-domain approach for coding speech-dominated input signals and a frequency-domain approach for coding generic audio signals. However, such switched solutions typically require longer processing delay, needed both for speech-music classification and for transform to the frequency domain.
To overcome the above drawback, a more unified time-domain and frequency-domain model is proposed.