VL processing is performed on audio data in order to remove or at least reduce inconsistencies in loudness levels when the audio data is rendered. Loudness may be measured in units of phon. The loudness of a given sound in phon is the sound pressure level (SPL) of a 1 kHz tone having a subjective loudness equal to that of the sound. Loudness may also be measured in units of “sone”. There is a one-to-one mapping between phon units and sone units. One sone is defined as the loudness of a 40 dB (SPL) 1 kHz pure sine wave and is equivalent to 40 phon. The units of sone are such that a twofold increase in sone corresponds to a doubling of perceived loudness. For example, 4 sone is perceived as twice as loud as 2 sone.
Inconsistencies in loudness levels may be experienced when, for example, switching between television channels. This may be appreciated from FIG. 1, which shows respective audio waveforms before (top) and after (bottom) VL processing during switching between two television channels; what is shown is that after VL processing the loudness of Television Channel 1 is a little lower and the loudness of Television Channel 2 is considerably higher. Such inconsistencies may also be experienced within a single audio program. This may be appreciated from FIG. 2, which shows respective audio waveforms before (top) and after (bottom) VL processing for two different audio objects in a movie's audio program, namely dialogue and a sound effect of wind blowing in the background; what is shown is that after VL processing the loudness of the dialogue audio object is reduced without noticeably reducing the loudness of the sound effect audio object.
State-of-the-art VL processing technologies, such as Dolby® Volume, tend to employ a psychoacoustic model of human hearing to reduce the difference between a reference loudness level and the estimated loudness level that a listener would hear without the VL processing. That is, such technologies apply different gains to different parts of the audible frequency spectrum, to maintain consistent timbre as the loudness is changed. This may be appreciated from FIG. 3, which schematically shows four different plots of sound pressure level (in dB) versus frequency (in Hz) in the human-audible frequency range, before (left-hand side) and after (right hand side) VL processing; as can be seen, the amount of (negative) gain applied to the input audio signal is frequency dependent, with the aim of keeping all frequency components above the threshold of hearing while maintaining so far as possible the characteristics originally intended by the audio content producers and/or mixers.
State-of-the-art VL processing technologies tend to use at least one filter to apply frequency-dependent gain. The coefficients of the filter(s) are changed in real time in dependence on the audio data to be processed, typically on a frame-by-frame basis. A typical arrangement for changing the filter coefficients, as shown in FIG. 4 of the accompanying drawings, is to run two filters in parallel, on the same audio data, alternately changing the filter coefficients of one or other of the two filters, and then cross-fading between the respective outputs of the two filters over the course of a frame (of the audio data). Any suitable cross-fading profile can be chosen; in the arrangement shown in FIG. 4, the cross fading profile is linear.
It will be appreciated that running two filters in parallel, e.g. as shown in FIG. 4, results in a not-insignificant computational overhead, in terms of both processing cycles and memory usage. In order to avoid this overhead when possible, in some state-of-the-art VL processing technologies, the arrangement shown in FIG. 4 is disabled for audio data that is deemed not to require VL processing, and a simple gain is applied instead.