HE-AAC (high efficiency-advanced audio coding) is an efficient music audio codec at low and moderate bitrates (e.g. 24-96 kb/s for stereo content). In HE-AAC, the audio signal is down-sampled by a factor of two and the resulting lowband signal is AAC waveform coded. The removed high frequencies are coded parametrically using SBR at low additional bitrate (typically at 3 kb/s per audio channel). As a result, the total bitrate can be reduced significantly compared to plain AAC waveform coding across the full spectral band of the audio signal.
The transmitted SBR parameters describe the way the higher frequency bands are generated from the AAC decoded low band output. This generation process of the high frequency bands comprises a copy-and-paste or copy-up process of patches from the lowband signal to the high frequency bands. In HE-AAC a patch describes a group of adjacent subbands that are copied-up to higher frequencies in order to recreate high frequency content that was not AAC coded. Typically 2-3 patches are applied dependent on the coding bitrate conditions. Usually the patch parameters do not change over time for one coding bitrate condition. However the MPEG standard allows changing the patch parameters over time. The spectral envelopes of the artificially generated higher frequency bands are modified based on envelope parameters which are transmitted within the encoded bitstream. As a result of the copy-up process and the envelope adjustment, the characteristics of the original audio signal may be perceptually maintained.
SBR coding may use other SBR parameters in order to further adjust the signal in the extended frequency range, i.e. to adjust the high-band signal, by noise and/or tone addition/removal.
The present document provides means to evaluate if a PCM audio signal has been coded (encoded and decoded) using parametric frequency extension audio coding such as MPEG SBR technology (e.g. using HE-AAC). In other words, the present document provides means for analyzing a given audio signal in the uncompressed domain and for determining if the given audio signal had been previously submitted to parametric frequency extension audio coding. In yet other words, given a (decoded) audio signal (e.g. in PCM format), it may be desirable to know whether or not the audio signal had previously been encoded using a certain encoding/decoding scheme. In particular, it may be desirable to know whether or not the high-frequency spectral components of the audio signal were generated by a spectral bandwidth replication process. In addition, it may be desirable to know if a stereo signal was created based on a transmitted mono signal or if certain time/frequency regions of a stereo signal originate from time/frequency data of the same mono signal.
It should be noted that even though the methods outlined in the present document are described in the context of audio coding, they are applicable to any form of audio processing that incorporates duplication of time/frequency data. In particular, the methods may be applied in the context of blind SBR which is a special case in audio coding where no SBR parameters are transmitted.
A possible use case may be the protection of SBR related intellectual property rights, e.g. the monitoring of unauthorized usage of MPEG SBR technology or any other new parametric frequency extension coding tool fundamentally based on SBR e.g., Enhanced SBR (eSBR) in MPEG-D Universal Speech and Audio Codec (USAC). Furthermore, trans-coding and/or re-encoding may be improved when no more information other than the (decoded) PCM audio signal is available. By way of example, if it is known that the high-frequency spectral components of the decoded PCM audio signal have been generated by a bandwidth extension process, then this information could be used when re-encoding the audio signal. In particular, the parameters (e.g. the cross-over frequency and patch parameters) of the re-encoder could be set such that the high-frequency spectral components are SBR encoded, while the lowband signal is waveform encoded. This would result in bit-rate savings compared to plain waveform coding and higher quality bandwidth extension. Furthermore, knowledge regarding the encoding history of a (decoded) audio signal could be used for quality assurance of high bit-rate waveform encoded (e.g., AAC or Dolby Digital) content. This could be achieved by making sure that SBR coding or some other parametric coding scheme, which is not a transparent coding method, was not applied to the (decoded) audio signal in the past. In addition, the knowledge regarding the encoding history could be the basis for a sound quality assessment of the (decoded) audio signal, e.g. by taking into account the number and size of SBR patches detected within the (decoded) audio signal.
As such, the present document relates to the detection of parametric audio coding schemes in PCM encoded waveforms. The detection may be carried out by the analysis of repetitive patterns across frequency and/or audio channels. Identified parametric coding schemes may be MPEG Spectral Band Replication (SBR) in HE-AACv1 or v2, Parametric Stereo (PS) in HE-AAVv2, Spectral Extension (SPX) in Dolby Digital Plus and Coupling in Dolby Digital or Dolby Digital Plus. Since the analysis may be based on signal phase information, the proposed methods are robust against magnitude modifications as typically applied in parametric audio coding. In SBR coding schemes high frequency content is generated in the audio decoder by copying low frequency subbands into higher frequency regions and by adjusting the energy envelope in a perceptual sense. In parametric spatial audio coding schemes (e.g. PS, Coupling) data in multiple audio channels may be generated from transmitted data relating to only a single audio channel. The duplication of data may be tracked back robustly from PCM waveforms by analyzing phase information in frequency subbands.