The human perception of loudness has been studied for several decades, and accordingly many properties of this perception have been discovered and subsequently modeled. In the 1930's Fletcher and Munson found that at low signal levels, middle frequencies are perceived as louder than lower and higher frequencies but that this variation in sensitivity decreases as the level of sound increases. In the 1950's Zwicker and Stevens built on the work of Fletcher and Munson and developed more accurate and realistic models. FIG. 1, published by E. Zwicker in “Psychoacoustics,” (Zwicker's FIG. 8.4) shows the growth of loudness of both a 1 kHz tone and Uniform Excited Noise (“UEN”—noise with equal power in all critical bands). For a signal level below what is often termed the “bearing threshold,” no loudness is perceived. Above this threshold, there is a quick rise in perceived loudness up to an asymptote where loudness grows linearly with signal level. In FIG. 2 the equal loudness contours of ISO 226 show the same behavior, but as a function of frequency for sinusoidal tones. The contour lines, at increments of 10 phon show the sound pressure levels across frequency that the human ear perceives as equally loud. The lowest line represents the “hearing threshold” as a function of frequency.
The non-linear and frequency varying behavior of the human auditory system has a direct impact on the perceived timbre and imaging of audio signals. A complex, wideband audio signal, for example music, presented at a particular sound pressure level is perceived as having a particular spectral balance or timbre. If the same audio signal is presented at a different sound pressure level, the perceived spectral balance or timbre of the audio signal is different because, as shown in FIG. 2, the growth of perceived loudness is different for different frequencies.
A complex, wideband multichannel audio signal, presented over multiple loudspeakers, is also perceived as having a particular spatial balance. Spatial balance refers to the impression of the location of sound elements in the mix as well as the overall diffuseness of the mix due to the relative level of audio signals between two or more loudspeakers. If the same multichannel audio signal is presented at a different overall sound pressure level, the non-linear growth in perceived loudness and differing growth of loudness across frequency leads to a change in the perceived spatial balance of the multichannel audio signal. This effect is especially apparent when there is a significant difference in level between channels. Quieter channels are affected differently than louder channels which, for example, can lead to quiet channels dropping below the hearing threshold and audibly disappearing when the overall level is reduced.
In many situations there is a desire to adjust or scale the perceived loudness of an audio signal. The most obvious example is the traditional volume control that appears on many devices including consumer music players, home theater receiver/amplifiers and professional mixing consoles. The traditional simple volume or level control adjusts the audio signal by applying the same wideband gain applied to all channels without any consideration of the human auditory system and the resulting change in perceived timbre and spatial balance.
Recently Seefeldt in said WO 2006/047600 A1 published application and Seefeldt et al in said WO 2007/123608 A1 published application proposed a method that enables accurate scaling of the perceived loudness of monophonic and multichannel audio signals which maintains the perceived timbre and spatial balance as the loudness is scaled. In this method, a desired loudness scaling or target loudness is achieved by, in essence, inverting a psychoacoustic loudness measurement model and calculating frequency and time variant gains that can be applied to the audio signal. For the case of a volume control, a particular setting of the control corresponds to a fixed scaling of the perceptual loudness spectrum, referred to as specific loudness, and the gains are computed to achieve a target specific loudness equal to the original specific loudness of the audio multiplied by the scaling.
Scaling of specific loudness is most easily implemented in the digital domain where transforms may be utilized to divide the audio into the requisite critical bands for which the gains are computed and applied. The loudness-scaled digital audio may then be transformed into an analog signal using a digital-to-analog converter and then played back through an amplifier and speakers. FIG. 3 depicts a block diagram of such a loudness-compensating process or device. Digital audio is applied to a volume control process or volume controller (“Digital Loudness Compensating Volume Controller”) 2 that has a desired volume setting (“Volume Selection”), as an input from a user, for example. The process or device 2 computes and applies frequency and time variant gains to achieve the specific loudness scaling corresponding to a Volume Selection setting. Such a process or device may be implemented in accordance with the teachings of one or more of said above-cited patent applications of Seefeldt and Seefeldt et al. The modified digital audio is then converted to analog using a digital-to-analog conversion process or converter (“D/A”) 4.
Any practical digital implementation utilizes a limited bit depth to represent the audio signal (16 bits, for example, for Compact Disc audio). As the desired volume, or loudness scaling, is decreased, the resulting gains applied to the signal also decrease. Accordingly, the average level of the modified digital audio approaches the noise floor corresponding to the bit depth. If the modified audio is simply re-quantized after the attenuation, audible distortion may result. With a slight increase in computational complexity, the audio may be re-dithered with a white dither noise to remove distortion. Audibly, the dither introduces a constant white noise at approximately the level of the least significant bit of the digital representation. For 16-bit audio, this level corresponds to approximately 96 dB below full scale. However, the perceptual dynamic range of human hearing is significantly greater, approximately 120 dB. Thus, the still-audible attenuated audio may approach or even fall below the quantization noise floor, resulting in a low signal-to-noise (SNR) listening condition. A noise shaper may be used to move the dither noise to less perceptible areas of the spectrum, effectively reducing the level of the dither noise below the threshold of hearing. Such noise-shaping processes are computationally expensive, requiring high-precision arithmetic, and may not be practical in a consumer device.
Thus, there is a need for a loudness-compensating volume control that provides the functionality of digitally-implemented loudness compensation while reducing the problems associated with all-digital implementations thereof.