Embodiments according to the invention are related to an apparatus, a method and a computer program for obtaining a parameter describing a variation of a signal characteristic of a signal on the basis of actual transform-domain parameters describing the audio signal in a transform domain.
Embodiments according to the invention are related to an apparatus, a method and a computer program for obtaining a parameter describing a temporal variation of a signal characteristic of an audio signal on the basis of actual transform-domain parameters describing the audio signal in a transform domain.
Further embodiments according to the invention are related to signal variation estimation.
While the primary scope of the current invention is analysis of temporal variations of audio signals, the same method can be readily adapted to any digital signal and the variations that such signals exhibit on any of their axis. Such signals and variations include, for example, spatial and temporal variations in characteristics such as intensity and contrast of images and movies, modulations (variations) in characteristics such as amplitude and frequency of radar and radio signals, and variations in properties such as heterogeneity of electrocardiogram signals.
In the following, a brief introduction regarding the concept of signal variation estimation will be given. Classical signal processing usually begins with the assumption of locally stationary signals and for many applications, this is a reasonable assumption. However, to claim that signals such as speech and audio are locally stationary stretches the truth beyond acceptable levels in some cases. Signals whose characteristics rapidly change introduce distortions to analysis results that are difficult to contain by classical approaches and thus necessitate methodology specially tailored for rapidly varying signals.
For example, the coding of a speech signal with a transform based coder may be considered. Here, the input signal is analyzed in windows, whose contents are transformed to the spectral domain. When the signal is a harmonic signal whose fundamental frequency rapidly changes, the locations of spectral peaks, corresponding to the harmonics, change over time. If, for example, the analysis window length is relatively long in comparison to the change in fundamental frequency, the spectral peaks are spread to neighboring frequency bins. In other words, the spectral representation becomes smeared. This distortion may be specially severe at the upper frequencies, where the location of spectral peaks more rapidly moves when the fundamental frequency changes.
While methods exist for compensation of changes in the fundamental frequency, such as time-warped-modified-discrete-cosine-transform (TW-MDCT) (see references [8] and [3]), pitch variation estimation has remained a challenge.
In the past, pitch variation has been estimated by measuring the pitch and simply taking the time derivative. However, since pitch estimation is a difficult and often ambiguous task, the pitch variation estimates were littered with errors. Pitch estimation suffers, among others, from two types of common errors (see, for example, reference [2]). Firstly, when the harmonics have greater energy than the fundamental, estimators are often distracted to believe that the harmonic is actually the fundamental, whereby the output is a multiple of the true frequency. Such errors can be observed as discontinuities in the pitch track and produce a huge error in terms of the time derivative. Secondly, most pitch estimation methods basically rely on peak picking in the auto correlation (or similar) domain(s) by some heuristic. Especially in the case of varying signals, these peaks are broad (flat at the top), whereby a small error in the autocorrelation estimate can move the estimated peak location significantly. The pitch estimate is thus an unstable estimate.
As indicated above, the general approach in signal processing is to assume that the signal is constant in short time intervals and estimate the properties in such intervals. If, then, the signal is actually time-varying, it is assumed that the time evolution of the signal is sufficiently slow, so that the assumption of stationarity in a short interval is sufficiently accurate and analysis in short intervals will not produce significant distortion.
In view of the above, it is desirable to provide a concept for obtaining a parameter describing a temporal variation of a signal characteristic with improved robustness.