The present invention relates to audio signal processing and, particularly, to audio signal manipulation in the context of applying audio effects to a signal containing transient events.
It is known to manipulate audio signals such that the reproduction speed is changed, while the pitch is maintained. Known methods for such a procedure are implemented by phase vocoders or methods, like (pitch synchronous) overlap-add, (P)SOLA, as, for example, described in J. L. Flanagan and R. M. Golden, The Bell System Technical Journal, November 1966, pp. 1394 to 1509; U.S. Pat. No. 6,549,884 Laroche, J. & Dolson, M.: Phase-vocoder pitch-shifting; Jean Laroche and Mark Dolson, New Phase-Vocoder Techniques for Pitch-Shifting, Harmonizing And Other Exotic Effects”, Proc. 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, N.Y., Oct. 17-20, 1999; and Zölzer, U: DAFX: Digital Audio Effects; Wiley & Sons; Edition: 1 (Feb. 26, 2002); pp. 201-298.
Additionally, audio signals can be subjected to a transposition using such methods, i.e. phase vocoders or (P)SOLA where the special issue of this kind of transposition is that the transposed audio signal has the same reproduction/replay length as the original audio signal before transposition, while the pitch is changed. This is obtained by an accelerated reproduction of the stretched signals where the acceleration factor for performing the accelerated reproduction depends on the stretching factor for stretching the original audio signal in time. When one has a time-discrete signal representation, this procedure corresponds to a down-sampling of the stretched signal or decimation of the stretched signal by a factor equal to the stretching factor where the sampling frequency is maintained.
A specific challenge in such audio signal manipulations are transient events. Transient events are events in a signal in which the energy of the signal in the whole band or in a certain frequency range is rapidly changing, i.e. rapidly increasing or rapidly decreasing. Characteristic features of specific transients (transient events) are the distribution of signal energy in the spectrum. Typically, the energy of the audio signal during a transient event is distributed over the whole frequency while, in non-transient signal portions, the energy is normally concentrated in the low frequency portion of the audio signal or in specific bands. This means that a non-transient signal portion, which is also called a stationary or tonal signal portion has a spectrum, which is non-flat. In other words, the energy of the signal is included in a comparatively small number of spectral lines/spectral bands, which are strongly raised over a noise floor of an audio signal. In a transient portion however, the energy of the audio signal will be distributed over many different frequency bands and, specifically, will be distributed in the high frequency portion so that a spectrum for a transient portion of the audio signal will be comparatively flat and will, in any event be flatter than a spectrum of a tonal portion of the audio signal. Typically, a transient event is a strong change in time, which means that the signal will include many higher harmonics when a Fourier decomposition is performed. An important feature of these many higher harmonics is that the phases of these higher harmonics are in a very specific mutual relationship so that a superposition of all these sine waves will result in a rapid change of signal energy. In other words, there exists a strong correlation across the spectrum.
The specific phase situation among all harmonics can also be termed as a “vertical coherence”. This “vertical coherence” is related to a time/frequency spectrogram representation of the signal where a horizontal direction corresponds to the development of the signal over time and where the vertical dimension describes the interdependence over the frequency of the spectral components (transform frequency bins) in one short-time spectrum over frequency.
Due to the typical processing steps, which are performed in order to time stretch or shorten an audio signal, this vertical coherence is destroyed, which means that a transient is “smeared” over time when a transient is subjected to a time stretching or time shortening operation as e.g. performed by a phase vocoder or any other method, which performs a frequency-dependent processing introducing phase shifts into the audio signal, which are different for different frequency coefficients.
When the vertical coherence of transients is destroyed by an audio signal processing method, the manipulated signal will be very similar to the original signal in stationary or non-transient portions, but the transient portions will have a reduced quality in the manipulated signal. The uncontrolled manipulation of the vertical coherence of a transient results in temporal dispersion of the same, since many harmonic components contribute to a transient event and changing the phases of all these components in an uncontrolled manner inevitably results in such artifacts.
However, transient portions are extremely important for the dynamics of an audio signal, such as a music signal or a speech signal where sudden changes of energy in a specific time represent a great deal of the subjective user impression on the quality of the manipulated signal. In other words, transient events in an audio signal are typically quite remarkable “milestones” of an audio signal, which have an over-proportional influence on the subjective quality impression. Manipulated transients in which the vertical coherence has been destroyed by a signal processing operation or has been degraded with respect to the transient portion of the original signal will sound distorted, reverberant and unnatural to the listener.
Some current methods stretch the time around the transients to a higher extent so as to have to subsequently perform, during the duration of the transient, no or only minor time stretching. Such known references and patents describe methods for time and/or pitch manipulation. Known references are: Laroche L., Dolson M.: Improved phase vocoder timescale modification of audio”, IEEE Trans. Speech and Audio Processing, vol. 7, no. 3, pp. 323-332; Emmanuel Ravelli, Mark Sandler and Juan P. Bello: Fast implementation for non-linear time-scaling of stereo audio; Proc. of the 8th Int. Conference on Digital Audio Effects (DAFx'05), Madrid, Spain, Sep. 20-22, 2005; Duxbury, C. M. Davies, and M. Sandler (2001, December). Separation of transient information in musical audio using multiresolution analysis techniques. In Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-01), Limerick, Ireland; and Röbel, A.: A NEW APPROACH TO TRANSIENT PROCESSING IN THE PHASE VOCODER; Proc. of the 6th Int. Conference on Digital Audio Effects (DAFx-03), London, UK, Sep. 8-11, 2003.
During time stretching of audio signals by phase vocoders, transient signal portions are “blurred” by dispersion, since the so-called vertical coherence of the signal is impaired. Methods using so-called overlap-add methods, like (P)SOLA may generate disturbing pre- and post-echoes of transient sound events. These problems may actually be addressed by increased time stretching in the environment of transients; however, if a transposition is to occur, the transposition factor will no longer be constant in the environment of the transients, i.e. the pitch of superimposed (possibly tonal) signal components will change and will be perceived as a disturbance.