There have been many attempts to develop a satisfactory objective method of measuring loudness. Fletcher and Munson determined in 1933 that human hearing is less sensitive at low and high frequencies than at middle (or voice) frequencies. They also found that the relative change in sensitivity decreased as the level of the sound increased. An early loudness meter consisted of a microphone, amplifier, meter and a combination of filters designed to roughly mimic the frequency response of hearing at low, medium and high sound levels.
Even though such devices provided a measurement of the loudness of a single, constant level, isolated tone, measurements of more complex sounds did not match the subjective impressions of loudness very well. Sound level meters of this type have been standardized but are only used for specific tasks, such as the monitoring and control of industrial noise.
In the early 1950s, Zwicker and Stevens, among others, extended the work of Fletcher and Munson in developing a more realistic model of the loudness perception process. Stevens published a method for the “Calculation of the Loudness of Complex Noise” in the Journal of the Acoustical Society of America in 1956, and Zwicker published his “Psychological and Methodical Basis of Loudness” article in Acoustica in 1958. In 1959 Zwicker published a graphical procedure for loudness calculation, as well as several similar articles shortly after. The Stevens and Zwicker methods were standardized as ISO 532, parts A and B (respectively). Both methods involve similar steps.
First, the time-varying distribution of energy along the basilar membrane of the inner ear, referred to as the excitation, is simulated by passing the audio through a bank of band-pass auditory filters with center frequencies spaced uniformly on a critical band rate scale. Each auditory filter is designed to simulate the frequency response at a particular location along the basilar membrane of the inner ear, with the filter's center frequency corresponding to this location. A critical-band width is defined as the bandwidth of one such filter. Measured in units of Hertz, the critical-band width of these auditory filters increases with increasing center frequency. It is therefore useful to define a warped frequency scale such that the critical-band width for all auditory filters measured in this warped scale is constant. Such a warped scale is referred to as the critical band rate scale and is very useful in understanding and simulating a wide range of psychoacoustic phenomena. See, for example, Psychoacoustics—Facts and Models by E. Zwicker and H. Fastl, Springer-Verlag, Berlin, 1990. The methods of Stevens and Zwicker utilize a critical band rate scale referred to as the Bark scale, in which the critical-band width is constant below 500 Hz and increases above 500 Hz. More recently, Moore and Glasberg defined a critical band rate scale, which they named the Equivalent Rectangular Bandwidth (ERB) scale (B. C. J. Moore, B. Glasberg, T. Baer, “A Model for the Prediction of Thresholds, Loudness, and Partial Loudness,” Journal of the Audio Engineering Society, Vol. 45, No. 4, April 1997, pp. 224-240). Through psychoacoustic experiments using notched-noise maskers, Moore and Glasberg demonstrated that the critical-band width continues to decrease below 500 Hz, in contrast to the Bark scale where the critical-band width remains constant.
Following the computation of excitation is a non-linear compressive function that generates a quantity referred to as “specific loudness”. Specific loudness is a measure of perceptual loudness as a function of frequency and time and may be measured in units of perceptual loudness per unit frequency along a critical band rate scale, such as the Bark or ERB scale discussed above. Finally, the time-varying “total loudness” is computed by integrating specific loudness across frequency. When specific loudness is estimated from a finite set of auditory filters distributed uniformly along a critical band rate scale, total loudness may be computed by simply summing the specific loudness from each filter.
Loudness may be measured in units of phon. The loudness of a given sound in phon is the sound pressure level (SPL) of a 1 kHz tone having a subjective loudness equal to that of the sound. Conventionally, the reference 0 dB for SPL is a root mean square pressure of 2×10−5 Pascal, and this is also therefore the reference 0 phon. Using this definition in comparing the loudness of tones at frequencies other than 1 kHz with the loudness at 1 kHz, a contour of equal loudness can be determined for a given phon level. FIG. 11 shows equal loudness contours for frequencies between 20 Hz and 12.5 kHz, and for phon levels between 4.2 phon (considered to be the threshold of hearing) and 120 phon (ISO226: 1087 (E), “Acoustics—Normal equal loudness level contours”). The phon measurement takes into account the varying sensitivity of human hearing with frequency, but the results do not allow the assessment of the relative subjective loudnesses of sounds at varying levels because there is no attempt to correct for the non-linearity of the growth of loudness with SPL, that is, for the fact that the spacing of the contours varies.
Loudness may also be measured in units of “sone”. There is a one-to-one mapping between phon units and sone units, as indicated in FIG. 11. One sone is defined as the loudness of a 40 dB (SPL) 1 kHz pure sine wave and is equivalent to 40 phon. The units of sone are such that a twofold increase in sone corresponds to a doubling of perceived loudness. For example, 4 sone is perceived as twice as loud as 2 sone. Thus, expressing loudness levels in sone is more informative. Given the definition of specific loudness as a measure of perceptual loudness as a function of frequency and time, specific loudness may be measured in units of sone per unit frequency. Thus, when using the Bark scale, specific loudness has units of sone per Bark and likewise when using the ERB scale, the units are sone per ERB.
As mentioned above, the sensitivity of the human ear varies with both frequency and level, a fact well documented in the psychoacoustics literature. One of the results is that the perceived spectrum or timbre of a given sound varies with the acoustic level at which the sound is heard. For example, for a sound containing low, middle and high frequencies, the perceived relative proportions of such frequency components change with the overall loudness of the sound; when it is quiet the low and high frequency components sound quieter relative to the middle frequencies than they sound when it is loud. This phenomenon is well known and has been mitigated in sound reproducing equipment by so-called loudness controls. A loudness control is a volume control that applies low- and sometimes also high-frequency boost as the volume is turned down. Thus, the lower sensitivity of the ear at the frequency extremes is compensated by an artificial boost of those frequencies. Such controls are completely passive; the degree of compensation applied is a function of the setting of the volume control or some other user-operated control, not as a function of the content of the audio signals.
In practice, changes in perceived relative spectral balance among low, middle and high frequencies depend on the signal, in particular on its actual spectrum and on whether it is intended to be loud or soft. Consider the recording of a symphony orchestra. Reproduced at the same level that a member of the audience would hear in a concert hall, the balance across the spectrum may be correct whether the orchestra is playing loudly or quietly. If the music is reproduced 10 dB quieter, for example, the perceived balance across the spectrum changes in one manner for loud passages and changes in another manner for quiet passages. A conventional passive loudness control does not apply different compensations as a function of the music.
In International Patent Application No. PCT/US2004/016964, filed May 27, 2004, published Dec. 23, 2004 as WO 2004/111994 A2, Seefeldt et al disclose, among other things, a system for measuring and adjusting the perceived loudness of an audio signal. Said PCT application, which designates the United States, is hereby incorporated by reference in its entirety. In said application, a psychoacoustic model calculates the loudness of an audio signal in perceptual units. In addition, the application introduces techniques for computing a wideband multiplicative gain, which, when applied to the audio, results in the loudness of the gain-modified audio being substantially the same as a reference loudness. Application of such wideband gain, however, changes the perceived spectral balance of the audio.