Loudness represents the magnitude of the perceived intensity according to a human listener and is measured in units of sones. Experiments have revealed that critical bandwidths play an important role in loudness summation. In view of this, elaborate models that mimic the various stages of the human auditory system (outer ear, middle ear, and inner ear) have been proposed. Such models model the cochlea as a bank of auditory filters with bandwidths corresponding to critical bandwidths. One advantage of such models is that they enable the determination of intermediate auditory patterns, such as excitation patterns (e.g., the magnitude of the basilar membrane vibrations) and loudness patterns (e.g., neural activity patterns) in addition to a final loudness estimate.
These auditory patterns correspond to different aspects of hearing sensations and are also directly related to the spectrum of any audio signal. Therefore, several speech and audio processing algorithms have made use of excitation patterns and loudness patterns in order to process the audio signals according to the perceptual qualities of the human auditory system. Some examples of such applications are bandwidth extension, sinusoidal analysis-synthesis, rate determination, audio coding, and speech enhancement applications. The excitation and loudness patterns have also been used in several objective measures that predict subjective quality, volume control, and hearing aid applications. However, obtaining the excitation and loudness patterns typically requires employing elaborate auditory models that include a model for sound transmission through the outer ear, the middle ear, and the inner ear. These models are associated with a high computational complexity, making real-time determination of such auditory patterns impractical or impossible. Moreover, these elaborate auditory models typically involve non-linear transformations, which present difficulties, particularly in applications that involve optimization of perceptually based objective functions. A perceptually based objective function is usually directed toward appropriately modifying the frequency spectrum to obtain a maximum perceptual benefit where the perceptual benefit is measured by incorporating an auditory model that generates the perceptual quantities (such as excitation and/or loudness patterns) for this purpose. The difficulty in solving the perceptually based objective functions lies in the fact that an optimal solution can be obtained only by searching the entire search space of candidate solutions. An alternative sub-optimal approach is based on following an iterative optimization technique. But in both cases, the evaluation of the auditory model has to be carried out multiple times and the computational complexity associated with the process is extremely high and often not suitable for real-time applications.
Accordingly, there is a need for a computationally efficient process that can determine a total loudness estimate, as well as auditory patterns such as the excitation pattern and the loudness pattern.