The present application is related to audio signal processing and, particularly, to audio processing usable in artificial reverberators.
The determination of a measure for a perceived level of reverberation is, for example, desired for applications where an artificial reverberation processor is operated in an automated way and needs to adapt its parameters to the input signal such that the perceived level of the reverberation matches a target value. It is noted that the term reverberance while alluding to the same theme, does not appear to have a commonly accepted definition which makes it difficult to use as a quantitative measure in a listening test and prediction scenario.
Artificial reverberation processors are often implemented as linear time-invariant systems and operated in a send-return signal path, as depicted in FIG. 6, with pre-delay d, reverberation impulse response (RIR) and a scaling factor g for controlling the direct-to-reverberation ratio (DRR). When implemented as parametric reverberation processors, they feature a variety of parameters, e.g. for controlling the shape and the density of the RIR, and the inter-channel coherence (ICC) of the RIRs for multi-channel processors in one or more frequency bands.
FIG. 6 shows a direct signal x[k] input at an input 600, and this signal is forwarded to an adder 602 for adding this signal to a reverberation signal component r[k] output from a weighter 604, which receives, at its first input, a signal output by a reverberation filter 606 and which receives, at its second input, a gain factor g. The reverberation filter 606 may have an optional delay stage 608 connected upstream of the reverberation filter 606, but due to the fact that the reverberation filter 606 will include some delay by itself, the delay in block 608 can be included in the reverberation filter 606 so that the upper branch in FIG. 6 can only comprise a single filter incorporating the delay and the reverberation or only incorporate the reverberation without any additional delay. A reverberation signal component is output by the filter 606 and this reverberation signal component can be modified by the multiplier 606 in response to the gain factor g in order to obtain the manipulated reverberation signal component r[k] which is then combined with the direct signal component input at 600 in order to finally obtain the mix signal m[k] at the output of the adder 602. It is noted that the term “reverberation filter” refers to common implementations of artificial reverberations (either as convolution which is equivalent to FIR filtering, or as implementations using recursive structures, such as Feedback Delay Networks or networks of allpass filters and feedback comb filters or other recursive filters), but designates a general processing which produces a reverberant signal. Such processings may involve non-linear processes or time varying processes such as low-frequent modulations of signal amplitudes or delay lengths. In these cases the term “reverberation filter” would not apply in a strict technical sense of an Linear Time Invariant (LTI) system. In fact, the “reverberation filter” refers to a processing which outputs a reverberant signal, possibly including a mechanism for reading a computed or recorded reverberant signal from memory.
These parameters have an impact on the resulting audio signal in terms of perceived level, distance, room size, coloration and sound quality. Furthermore, the perceived characteristics of the reverberation depend on the temporal and spectral characteristics of the input signal [1]. Focusing on a very important sensation, namely loudness, it can be observed that the loudness of the perceived reverberation is monotonically related to the non-stationarity of the input signal. Intuitively speaking, an audio signal with large variations in its envelope excites the reverberation at high levels and allows it to become audible at lower levels. In a typical scenario where the long-term DRR expressed in decibels is positive, the direct signal can mask the reverberation signal almost completely at time instances where its energy envelope increases. On the other hand, whenever the signal ends, the previously excited reverberation tail becomes apparent in gaps exceeding a minimum duration determined by the slope of the post-masking (at maximum 200 ms) and the integration time of the auditory system (at maximum 200 ms for moderate levels).
To illustrate this, FIG. 4a shows the time signal envelopes of a synthetic audio signal and of an artificially generated reverberation signal, and FIG. 4b shows predicted loudness and partial loudness functions computed with a computational model of loudness. An RIR with a short pre-delay of 50 ms is used here, omitting early reflections and synthesizing the late part of the reverberation with exponentially decaying white noise [2]. The input signal has been generated from a harmonic wide-band signal and an envelope function such that one event with a short decay and a second event with a long decay are perceived. While the long event produces more total reverberation energy, it comes to no surprise that it is the short sound which is perceived as being more reverberant. Where the decaying slope of the longer event masks the reverberation, the short sound already disappeared before the reverberation has built up and thereby a gap is open in which the reverberation is perceived. Please note that the definition of masking used here includes both complete and partial masking [3].
Although such observations have been made many times [4, 5, 6], it is still worth emphasizing them because it illustrates qualitatively why models of partial loudness can be applied in the context of this work. In fact, it has been pointed out that the perception of reverberation arises from stream segregation processes in the auditory system [4, 5, 6] and is influenced by the partial masking of the reverberation due to the direct sound.
The considerations above motivate the use of loudness models. Related investigations were performed by Lee et al. and focus on the prediction of the subjective decay rate of RIRs when listening to them directly [7] and on the effect of the playback level on reverberance [8]. A predictor for reverberance using loudness-based early decay times is proposed in [9]. In contrast to this work, the prediction methods proposed here process the direct signal and the reverberation signal with a computational model of partial loudness (and with simplified versions of it in the quest for low-complexity implementations) and thereby consider the influence of the input (direct) signal on the sensation. Recently, Tsilfidis and Mourjopoulus [10] investigated the use of a loudness model for the suppression of the late reverberation in single-channel recordings. An estimate of the direct signal is computed from the reverberant input signal using a spectral subtraction method, and a reverberation masking index is derived by means of a computational auditory masking model, which controls the reverberation processing.
It is a feature of a multi-channel synthesizers and other devices to add reverberation in order to make the sound better from a perceptual point of view. On the other hand, the generated reverberation is an artificial signal which when added to the signal at to low level is barely audible and when added at to high level leads to unnatural and unpleasant sounding final mixed signal. What makes things even worse is that, as discussed in the context of FIGS. 4a and 4b that the perceived level of reverberation is strongly signal-dependent and, therefore, a certain reverberation filter might work very well for one kind of signals, but may have no audible effect or, even worse, can generate serious audible artifacts for a different kind of signals.
An additional problem related to reverberation is that the reverberated signal is intended for the ear of an entity or individual, such as a human being and the final goal of generating a mix signal having a direct signal component and a reverberation signal component is that the entity perceives this mixed signal or “reverberated signal” as sounding well or as sounding natural. However, the auditory perception mechanism or the mechanism how sound is actually perceived by an individual is strongly non-linear, not only with respect to the bands in which the human hearing works, but also with respect to the processing of signals within the bands. Additionally, it is known that the human perception of sound is not so much directed by the sound pressure level which can be calculated by, for example, squaring digital samples, but the perception is more controlled by a sense of loudness. Additionally, for mixed signals, which include a direct component and a reverberation signal component, the sensation of the loudness of the reverberation component depends not only on the kind of direct signal component, but also on the level or loudness of the direct signal component.
Therefore, there exists a need for determining a measure for a perceived level of reverberation in a signal consisting of a direct signal component and a reverberation signal component in order to cope with the above problems related with the auditory perception mechanism of an entity.