The present invention relates to audio processing and, in particular to an apparatus and method for processing an audio signal and for providing a higher temporal granularity for a Combined Unified Speech and Audio Codec (USAC).
USAC, as other audio codecs, exhibits a fixed frame size (USAC: 2048 samples/frame). Although there is the possibility to switch to a limited set of shorter transform sizes within one frame, the frame size still limits the temporal resolution of the complete system. To increase the temporal granularity of the complete system, for traditional audio codecs the sampling rate is increased, leader to a shorter duration of one frame in time (e.g. milliseconds). However, this is not easily possible for the USAC codec:
The USAC codec comprises a combination of tools from traditional general audio codecs, such as AAC (Advanced Audio Coding) transform coder, SBR (Spectral Band Replication) and MPEG Surround (MPEG=Moving Picture Experts Group), plus tools from traditional speech coders, such as ACELP (ACELP=Algebraic Code Excited Linear Prediction). Both, ACELP and transform coder, run usually at the same time within the same environment (i.e. frame size, sampling rate), and can be easily switched: usually, for clean speech signals, the ACELP tool is used, and for music, mixed signals the transform coder is used.
The ACELP tool is at the same time limited to work only at comparably low sampling rates. For 24 kbit/s, a sampling rate of only 17075 Hz is used. For higher sampling rates, the ACELP tool starts to drop significantly in performance. The transform coder as well as SBR and MPEG Surround however would benefit from a much higher sampling rate, for example 22050 Hz for the transform coder and 44100 Hz for SBR and MPEG Surround. So far, however, the ACELP tool limited the sampling rate of the complete system, leading to a suboptimal system in particular for music signals.