In current audio content production and delivery chains the digitally available master content (PCM stream) is encoded e.g. by a professional AAC encoder at the content creation site. The resulting AAC bitstream is then made available for purchase e.g. through the Apple iTunes Music store. It appeared in rare cases that some decoded PCM samples are “clipping” which means that two or more consecutive samples reached the maximum level that can be represented by the underlying bit resolution (e.g. 16 bit) of a uniformly quantized fixed point representation (PCM) for the output wave form. This may lead to audible artifacts (clicks or short distortion). Since this happens at the decoder side, there is no way of resolving the problem after the content has been delivered. The only way to handle this problem at the decoder side would be to create a “plug-in” for decoders providing anti-clipping functionality. Technically this would mean a modification of the energy distribution in the subbands (however only on a forward mode, i.e. there would be no iteration loop which takes into account the psychoacoustic model . . . ). Assuming an audio signal at the encoder's input that is below the threshold of clipping, the reasons for clipping in a modern perceptual audio encoder are manifold. First of all, the audio encoder applies quantization to the transmitted signal which is available in a frequency decomposition of the input wave form in order to reduce the transmission data rate. Quantization errors in the frequency domain result in small deviations of the signal's amplitude and phase with respect to the original waveform. If amplitude or phase errors add up constructively, the resulting amplitude in the time domain may temporarily be higher than the original waveform. Secondly parametric coding methods (e.g. Spectral Band Replication, SBR) parameterize the signal power in a rather coarse manner. Phase information is omitted. Consequently the signal at the receiver side is only regenerated with correct power but without waveform preservation. Signals with an amplitude close to full scale are prone to clipping.
Since in the compressed bitstream representation the dynamic range of the frequency decomposition is much larger than a typical 16-bit PCM range, the bitstream can carry higher signal levels. Consequently the actual clipping appears only, when the decoders output signal is converted (and limited) to a fixed point PCM representation.
It would be desirable to prevent the occurrence of clipping at the decoder by providing an encoded signal to the decoder that does not exhibit clipping so that there is no need for implementing a clipping prevention at the decoder. In other words, it would be desirable if the decoder can perform standard decoding without having to process the signal with respect to clipping prevention. In particular, a lot of decoders are already deployed nowadays and these decoders would have to be upgraded in order to benefit from a decoder-side clipping prevention. Furthermore, once clipping has occurred (i.e., the audio signal to be encoded has been encoded in a manner that is prone to the occurrence of clipping), some information may be irrecoverably lost so that even a clipping prevention-enabled encoder may have to resort to extrapolating or interpolating the clipped signal portion on the basis of preceding and/or subsequent signal portions.