The present invention relates to a scheme for manipulating an audio signal by modifying phases of spectral values of the audio signal such as within a bandwidth extension (BWE) scheme.
Storage or transmission of audio signals is often subject to strict bitrate constraints. In the past, coders were forced to drastically reduce the transmitted audio bandwidth when only a very low bitrate was available. Modern audio codecs are nowadays able to code wide-band signals by using bandwidth extension methods, as described in M. Dietz, L. Liljeryd, K. Kjörling and O. Kunz, “Spectral Band Replication, a novel approach in audio coding,” in 112th AES Convention, Munich, May 2002; S. Meltzer, R. Böhm and F. Henn, “SBR enhanced audio codecs for digital broadcasting such as “Digital Radio Mondiale” (DRM),” in 112th AES Convention, Munich, May 2002; T. Ziegler, A. Ehret, P. Ekstrand and M. Lutzky, “Enhancing mp3 with SBR: Features and Capabilities of the new mp3PRO Algorithm,” in 112th AES Convention, Munich, May 2002; International Standard ISO/IEC 14496-3:2001/FPDAM 1, “Bandwidth Extension,” ISO/IEC, 2002. Speech bandwidth extension method and apparatus Vasu Iyengar et al.; E. Larsen, R. M. Aarts, and M. Danessis. Efficient high-frequency bandwidth extension of music and speech. In AES 112th Convention, Munich, Germany, May 2002; R. M. Aarts, E. Larsen, and O. Ouweltjes. A unified approach to low- and high frequency bandwidth extension. In AES 115th Convention, New York, USA, October 2003; K. Käyhkö. A Robust Wideband Enhancement for Narrowband Speech Signal. Research Report, Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing, 2001; E. Larsen and R. M. Aarts. Audio Bandwidth Extension—Application to psychoacoustics, Signal Processing and Loudspeaker Design. John Wiley & Sons, Ltd, 2004; E. Larsen, R. M. Aarts, and M. Danessis. Efficient high-frequency bandwidth extension of music and speech. In AES 112th Convention, Munich, Germany, May 2002; J. Makhoul. Spectral Analysis of Speech by Linear Prediction. IEEE Transactions on Audio and Electroacoustics, AU-21(3), June 1973; U.S. patent application Ser. No. 08/951,029, Ohmori, et al. Audio band width extending system and method and U.S. Pat. No. 6,895,375, Malah, D & Cox, R. V.: System for bandwidth extension of Narrow-band speech. These algorithms rely on a parametric representation of the high-frequency content (HF), which is generated from the waveform coded low-frequency part (LF) of the decoded signal by means of transposition into the HF spectral region (“patching”) and application of a parameter driven post processing.
Lately, a new algorithm which employs phase vocoders as, for example, described in M. Puckette. Phase-locked Vocoder. IEEE ASSP Conference on Applications of Signal Processing to Audio and Acoustics, Mohonk 1995.”, Röbel, A.: Transient detection and preservation in the phase vocoder; citeseer.ist.psu.edu/679246.html; Laroche L., Dolson M.: “Improved phase vocoder timescale modification of audio”, IEEE Trans. Speech and Audio Processing, vol. 7, no. 3, pp. 323-332 and U.S. Pat. No. 6,549,884 Laroche, J. & Dolson, M.: Phase-vocoder pitch-shifting for the patch generation, has been presented in Frederik Nagel, Sascha Disch, “A harmonic bandwidth extension method for audio codecs,” ICASSP International Conference on Acoustics, Speech and Signal Processing, IEEE CNF, Taipei, Taiwan, April 2009. However, this method called “harmonic bandwidth extension” (HBE) is prone to quality degradations of transients contained in the audio signal, as described in Frederik Nagel, Sascha Disch, Nikolaus Rettelbach, “A phase vocoder driven bandwidth extension method with novel transient handling for audio codecs,” 126th AES Convention, Munich, Germany, May 2009, since vertical coherence over sub-bands is not guaranteed to be preserved in the standard phase vocoder algorithm and, moreover, the re-calculation of the Discrete Fourier Transform (DFT) phases has to be performed on isolated time blocks of a transform implicitly assuming circular periodicity.
It is known that specifically two kinds of artifacts due to the block based phase vocoder processing can be observed. These, in particular, are dispersion of the waveform and temporal aliasing due to temporal cyclic convolution effects of the signal due to the application of newly calculated phases.
In other words, because of the application of a phase modification on the spectral values of the audio signal in the BWE algorithm, a transient contained in a block of the audio signal may be wrapped around the block, i.e. cyclically convolved back into the block. This results in temporal aliasing and, consequently, leads to a degradation of the audio signal.
Therefore, methods for a special treatment for signal parts containing transients should be employed. However, especially since the BWE algorithm is performed on the decoder side of a codec chain, computational complexity is a serious issue. Accordingly, measures against the just-mentioned audio signal degradation should advantageously not come at the price of a largely increased computational complexity.