Embodiments according to the invention relate to audio signal processing and particularly to an apparatus and method for modifying an input audio signal.
There have been many attempts to develop a satisfactory objective method of measuring loudness. Fletcher and Munson determined in 1933 that human hearing is less sensitive at low and high frequencies than at middle (or voice) frequencies. They also found that the relative change in sensitivity decreased as the level of the sound increased. An early loudness meter consisted of a microphone, amplifier, meter and a combination of filters designed to roughly mimic the frequency response of hearing at low, medium and high sound levels.
Even though such devices provided a measurement of the loudness of a single, constant level, isolated tone, measurements of more complex sounds did not match the subjective impressions of loudness very well. Sound level meters of this type have been standardized but are only used for specific tasks, such as the monitoring and control of industrial noise.
In the early 1950s, Zwicker and Stevens, among others, extended the work of Fletcher and Munson in developing a more realistic model of the loudness perception process. Stevens published a method for the “Calculation of the Loudness of Complex Noise” in the Journal of the Acoustical Society of America in 1956, and Zwicker published his “Psychological and Methodical Basis of Loudness” article in Acoustica in 1958. In 1959 Zwicker published a graphical procedure for loudness calculation, as well as several similar articles shortly after. The Stevens and Zwicker methods were standardized as ISO 532, parts A and B (respectively). Both methods involve similar steps.
First, the time-varying distribution of energy along the basilar membrane of the inner ear, referred to as the excitation, is simulated by passing the audio through a bank of band-pass auditory filters with center frequencies spaced uniformly on a critical band rate scale. Each auditory filter is designed to simulate the frequency response at a particular location along the basilar membrane of the inner ear, with the filter's center frequency corresponding to this location. A critical-band width is defined as the bandwidth of one such filter. Measured in units of Hertz, the critical-band width of these auditory filters increases with increasing center frequency. It is therefore useful to define a warped frequency scale such that the critical-band width for all auditory filters measured in this warped scale is constant. Such a warped scale is referred to as the critical band rate scale and is very useful in understanding and simulating a wide range of psychoacoustic phenomena. See, for example, Psychoacoustics-Facts and Models by E. Zwicker and H. Fastl, Springer-Verlag, Berlin, 1990. The methods of Stevens and Zwicker utilize a critical band rate scale referred to as the Bark scale, in which the critical-band width is constant below 500 Hz and increases above 500 Hz. More recently, Moore and Glasberg defined a critical band rate scale, which they named the Equivalent Rectangular Bandwidth (ERB) scale (B. C. J. Moore, B. Glasberg, T. Baer, “A Model for the Prediction of Thresholds, Loudness, and Partial Loudness,” Journal of the Audio Engineering Society, Vol. 45, No. 4, April 1997, pp. 224-240). Through psychoacoustic experiments using notched-noise maskers, Moore and Glasberg demonstrated that the critical-band width continues to decrease below 500 Hz, in contrast to the Bark scale where the critical-band width remains constant.
The term “critical band” goes back to the work by Harvey Fletcher in 1938 on masking of sound sensation by accompanying signals (“J. B. Allen, “A short history of telephone psychophysics”, Audio Eng. Soc. Convention, 1997”). Critical bands can be expressed using the Bark scale proposed by Zwicker in 1961: each critical band has the width of one Bark (a unit named after the Heinrich Barkhausen). Over filter banks mimicking the human auditory perception exist, e.g., the Equivalent Rectangular Bandwidth (ERB) scale (“B. C. J. Moore, B. R. Glasberg and T. Baer, “A model for the prediction of thresholds, loudness, and partial loudness”, J. Audio Eng. Soc., 1997”).
The term “specific loudness” describes the sensation of loudness caused by a signal on a certain region of the basilar membrane to a certain frequency bandwidth measured in critical bands. It is measured in units of Sone/Bark. The term “critical band” relates to the frequency bands of an auditory filter bank which comprises non-uniform band-pass filter banks designed to imitate the frequency resolution of human hearing. The overall loudness of a sound equals the sum/integral of the specific loudness across all critical bands.
A method for processing an audio signal has been described in “A. J. Seefeldt, “Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal”. US Patent 2009/0097676, 2009”. This method aims at the control of the specific loudness of the audio signal, with applications to volume control, dynamic range control, dynamic equalization and background noise compensation. In this document an input audio signal (normally in the frequency domain) is modified such that its specific loudness matches a target specific loudness.
To illustrate the advantage of the processing as presented in “A. J. Seefeldt, “Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal”. US Patent 2009/0097676, 2009”, consider the volume control of an audio signal. Changing the level of an audio signal in sound reproduction normally aims at the change of its perceived loudness. Said differently, the control of the loudness is traditionally implemented as the control of the sound level. However, our daily experience and the knowledge of psychoacoustic indicate that this is not optimal.
The sensitivity of the human hearing varies with both frequency and level such that a decrease of the sound intensity level attenuates the sensation of low and high frequencies (e.g., around 100 Hz and 10000 Hz, respectively) more than the sensation of middle frequencies (e.g., between 2000 and 4000 Hz). When decreasing the playback level from a “comfortably loud” level (e.g., 75-80 dBA) to a lower level by e.g., 18 dB, the perceived spectral balance of the audio signal changes. This is illustrated in the well-known Equal-Loudness Contours, often referred to as Fletcher-Munson Curves (after the researchers who first measured the Equal-Loudness Contours in 1933). The Equal-Loudness Contour shows the sound pressure level (SPL) over the frequency spectrum, for which a listener perceives a constant loudness when presented with pure steady tones.
Equal-Loudness Contours are depicted in e.g. “B. C. J. Moore, B. R. Glasberg and T. Baer, “A model for the prediction of thresholds, loudness, and partial loudness”, J. Audio Eng. Soc., 1997), p. 232, FIG. 13”. A revised measurement has been standardized as ISO 226:2003 in 2003.
Consequently, the conventional loudness control does not only change the loudness but also the timbre. The impact of this effect depends on the SPL (it is less pronounced when changing the SPL from e.g., 86 dBA to 68 dBA compared to a change from 76 dBA to 58 dBA), but is not desired in all classes.
This is compensated by the processing as described in “A. J. Seefeldt, “Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal”. US Patent 2009/0097676, 2009”.
FIG. 7 shows a flow chart of a method 700 described in “A. J. Seefeldt, “Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal”. US Patent 2009/0097676, 2009”.
The output signal is processed by calculating 710 the excitation signal, calculating 720 the specific loudness, calculating 730 the target specific loudness, calculating 740 the target excitation signal, calculating 750 the spectral weights and applying 760 spectral weights to the input signal and resynthesizing the output signal.
The spectral weights H are weightings of the frequency bands which depend on the specific loudness of the input signal and on the target specific loudness. Their calculation, as described in “A. J. Seefeldt, “Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal”. US Patent 2009/0097676, 2009)”, comprises the calculation of the specific loudness and the inverse process of the calculation of the specific loudness, which is applied to the target specific loudness.
Both processing steps introduce a high computational load. Methods for the calculation of the specific loudness have been presented in “E. Zwicker, H. Fastl, U. Widmann, K. Kurakata, S. Kuwano and S. Namba, “Program for calculating loudness according to DIN 45631 (ISO 532 B)”, J. Acoust. Soc. Jpn. (E), vol. 12, 1991” and “B. C. J. Moore, B. R. Glasberg and T. Baer, “A model for the prediction of thresholds, loudness, and partial loudness”, J. Audio Eng. Soc., 1997”.