Loudness is the intensity of sound as perceived by a listener. The human auditory system, upon reception of an auditory stimulus, produces neural electrical impulses, which are transmitted to the auditory cortex in the brain. The perception of loudness is inferred in the brain. Hence, loudness is a subjective phenomenon. Loudness, as a quantity, is therefore different from the measure of sound pressure level in dB SPL. Through experiments on test subjects (also referred to as psychophysical experiments), it has been found that different signals produce different sensitivities in a human listener, because of which different sounds having the same sound pressure level can each have a different perceived loudness. Accordingly, quantifying loudness requires incorporation of knowledge of the working human auditory sensory system. Generally, methods to quantify loudness are based on psychoacoustic models that mathematically characterize the properties of the human auditory system.
Early attempts to quantify loudness were based on subjective judgments by human test subjects, and suffered from various accuracy problems. In an attempt to create an “absolute” scale for loudness (i.e., a scale where when the measure of loudness is scaled by a number ‘x’, the perceived loudness by a listener should also be scaled by the factor ‘x’), auditory pattern based loudness estimation was developed. One notable auditory pattern based loudness estimation model is the Moore-Glasberg method. A flow diagram illustrating the Moore-Glasberg method is shown in FIG. 1. First, a power spectrum of an auditory stimulus (i.e., a sound) is determined (step 100). This may be accomplished by performing a Fourier transform or a fast Fourier transform on the auditory stimulus. Next, an effective power spectrum is determined by applying a filter response representative of the response of the outer and middle ear to the power spectrum (step 102). An excitation pattern is then determined from the effective power spectrum by applying a filter response representative of the response of the basilar membrane of the ear in the cochlea along its length to the effective power spectrum via a full calculation method that is discussed in detail below (step 104). Generally, the response of the basilar membrane is approximated with a bank of bandpass filters, each of which are referred to herein as “detectors”. These detectors are evenly spaced throughout an auditory frequency range at a number of detector locations, and the total energy of the signals produced by the detectors comprise the excitation pattern. A specific loudness is then determined from the excitation pattern (step 106), and a total loudness is determined from the specific loudness (step 108). This measure of loudness is also referred to as instantaneous loudness. An averaged measure of the instantaneous loudness, referred to as the short-term loudness, may be determined from the total loudness (step 110). Further, an averaged measure of the short-term loudness, referred to as the long-term loudness, may be determined from the short-term loudness (step 112). Details of each one of the steps of the Moore-Glasberg method are discussed below.
FIG. 2 shows details of step 104 discussed above in FIG. 1. In order to determine the excitation pattern, an intensity pattern is determined from the effective power spectrum (step 104A). Details of determining the intensity pattern are discussed below. Next, an excitation at each one of a large number of detector locations is determined to obtain the excitation pattern (step 104B). The large number of detector locations are equally spaced within an auditory frequency range with high enough resolution to accurately determine the excitation pattern. Generally, the large number of detector locations used in such a determination greatly increases the computational complexity of the Moore-Glasberg method, as discussed in detail below.
The human outer ear accepts an auditory stimulus and transforms it as it is transferred to the eardrum. The transfer function of the outer ear is defined as the ratio of sound pressure of the stimulus at the eardrum to the free-field sound pressure of the stimulus. The outer ear response used in the Moore-Glasberg method is derived from stimuli incident from a frontal direction. Other angles of incidence would require correction factors in the response. The free-field sound pressure is the measured sound pressure at the position of the center of the listener's head when the listener is not present. The outer ear can thus be modeled as a linear filter, whose response is shown in FIG. 3. As it can be observed, the resonance of the outer ear canal at about 4 kHz results in the sharp peak around the same frequency in the response.
The middle ear transformation provides an important contribution to the increase in the absolute threshold of hearing at lower frequencies. The middle ear essentially attenuates the lower frequencies. The middle ear functions in this manner to prevent the amplification of the low level internal noise at the lower frequencies. These low frequency internal noises commonly arise from heartbeats, pulse, and activities of muscles. Hence, it is assumed in the Moore-Glasberg method that the middle ear has equal sensitivity to all frequencies above 500 Hz. Further, it is assumed that below 500 Hz the response of the middle ear filter is roughly the inverted shape of the absolute threshold curve at the same frequencies.
The combined outer and middle ear filter's magnitude frequency response is shown in FIG. 4. Such a filter response is used in step 102 described above. An input sound x(n) with a power spectrum Sx(ωi) (where
      ω    i    =      exp    ⁡          (                        j          ⁢                                          ⁢          2          ⁢          π          ⁢                                          ⁢                      f            i                                    f          s                    )      when me sampling frequency is fs) is processed with the combined outer-middle ear filter. If the frequency response of the outer-middle ear filter is M(ωi), then the output power spectrum of the filter is Sxc(ωi)=|M(ωi)|2Sx(ωi). This spectrum Sxc(ωi) reaches the inner ear and is referred to as the effective spectrum.
The basilar membrane receives the stimulating signal filtered by the outer and middle ear to produce mechanical vibrations. Each point on the membrane is tuned to a specific frequency and has a narrow bandwidth of response around that frequency. Hence, each location on the membrane acts as a “detector” of a particular frequency. To model this response, a bank of bandpass filters is used. Each filter represents the response of the basilar membrane at a specific location on the membrane. The combined filter response of the bank of bandpass filters is modeled as a rounded exponential filter, and the rising and falling slopes of the combined filter response are dependent upon the intensity level of the signal at the corresponding frequency band.
The detector locations on the membrane are represented on an auditory scale measured by an equivalent rectangular bandwidth (ERB) at each frequency. For a given center frequency f, the equivalent rectangular bandwidth is given by Equation (1):
                              ERB          ⁡                      (            f            )                          =                  24.67          ⁢                      (                                                            4.37                  ⁢                  f                                1000                            +              1                        )                                              (        1        )            The bandpass filters are represented on an auditory scale derived from the center frequencies of the filters. This auditory scale represents the frequencies based on their ERB values. Each frequency is mapped to an “ERB number”, because of which it is also referred to as the ERB scale. The ERB number for a frequency represents the number of ERB bandwidths that can be fitted below the same frequency. The conversion of frequency to the ERB scale is through the following expression. Here, f is the frequency in Hz, which maps to d in the ERB scale as shown in Equation (2):
                              d          ⁡                      (                          in              ⁢                                                          ⁢              ERB              ⁢                                                          ⁢              units                        )                          =                  21.4          ⁢                                    log              10                        ⁡                          (                                                                    4.37                    ⁢                    f                                    1000                                +                1                            )                                                          (        2        )            
Let D be the number of auditory filters that are used to represent responses of discrete locations of the basilar membrane. Let Lr={dk∥dk−dk−1|=0.1, k=1, 2 . . . D} be the set of detector locations equally spaced at a distance of 0.1 ERB units on the ERB scale. Each detector represents the center frequency of the corresponding bandpass filter. The magnitude frequency response of the bandpass filter at a detector location dk is defined in Equation (3) as:W(k,i)=(1+pk,igk,i)exp(−pk,igk,i),k=1, . . . D and i=1, . . . N  (3)where pk,i is the slope of the auditory filter corresponding to the detector dk at frequency fi and gk,i=|(fi−fck)/fck| is the normalized deviation of the frequency component fi from the center frequency fck of the detector.
The auditory filter slope pk,i is dependent on the intensity level of the effective spectrum of the signal within the equivalent rectangular bandwidth around the center frequency of that detector. The intensity pattern, I(k), is the total intensity of the effective power spectrum within one ERB around the center frequency of the detector dk, as shown in Equation (4):
                                          I            ⁡                          (              k              )                                =                                    ∑                              i                ∈                                  A                  k                                                      ⁢                                          S                x                c                            ⁡                              (                                  ω                  i                                )                                                    ,                              A            k                    =                                                 {                                                i                  ❘                                                                                    d                        k                                            -                      0.5                                        <                                          21.4                      ⁢                                                                        log                          10                                                ⁡                                                  (                                                                                                                    4.37                                ⁢                                                                                                                                  ⁢                                                                  f                                  i                                                                                            1000                                                        +                            1                                                    )                                                                                      ≤                                                                  d                        k                                            +                      0.5                                                                      ,                                  i                  =                  1                                ,                                  …                  ⁢                                                                          ⁢                  N                                            }                                                          (        4        )            Accordingly, determining the intensity pattern from the effective power spectrum as in step 104A of FIG. 2 may involve solving Equation (4). As known through experiments, an auditory filter has different slopes for the lower and upper skirts of the filter response. In the Moore-Glasberg method, the slope of the lower skirt pkl is dependent on the corresponding intensity pattern value, but the slope of the upper skirt pku is fixed. The parameters are given by Equation (5) and Equation (6):
                              p                      k            ,            l                    l                =                              p            k            51                    -                      0.38            ⁢                          (                                                p                  k                  51                                                  p                  100                  51                                            )                        ⁢                          (                                                I                  ⁡                                      (                    i                    )                                                  -                51                            )                                                          (        5        )                                          p                      k            ,            i                    u                =                  p          k          51                                    (        6        )            In the above equations, pk51 is the value of pk,i at the corresponding detector location when the intensity I(i) is at a level of 51 dB. It can be computed as shown in Equation (7):
                              p          k          51                =                              4            ⁢                          f                              c                k                                                          ERB            ⁡                          (                              f                                  c                  k                                            )                                                          (        7        )            
Thus, it can be seen that the slope of the lower skirt matches the auditory filter that is centered at a frequency of 1 kHz, when the effective spectrum of the auditory stimulus has an intensity of 51 dB at the same critical band. The slope pk,i chooses the lower skirt and the upper skirt according to Equation (8):
                              p                      k            ,            i                          =                  {                                                                                          p                                          k                      ,                      i                                        l                                    ,                                                            g                                              k                        ,                        i                                                              <                    0                                                                                                                                            p                                          k                      ,                      i                                        u                                    ,                                                            g                                              k                        ,                        i                                                              ≥                    0                                                                                                          (        8        )            
The excitation pattern is thus evaluated from Equation (9) and Equation (10):
                                          E            ⁡                          (              k              )                                =                    ⁢                                    ∑                              i                =                1                            D                        ⁢                                          W                ⁡                                  (                                      k                    ,                    i                                    )                                            .                                                S                  x                  c                                ⁡                                  (                                      ω                    i                                    )                                                                    ,                  k          =          1                ,                              …            ⁢                                                  ⁢            D            ⁢                                                  ⁢            and            ⁢                                                  ⁢            i                    =          1                ,                  …          ⁢                                          ⁢          N                                                            ⁢                  (          9          )                                                  =                    ⁢                                    ∑                              i                =                1                            D                        ⁢                                          (                                  1                  +                                                            p                                              k                        ,                        i                                                              ⁢                                          g                                              k                        ,                        i                                                                                            )                            ⁢                              exp                ⁡                                  (                                                            -                                              p                                                  k                          ,                          i                                                                                      ⁢                                          g                                              k                        ,                        i                                                                              )                                                                    ,                  k          =          1                ,                  …          ⁢                                          ⁢          D                                            ⁢                  (          10          )                                                ⁢                                            and              ⁢                                                          ⁢              i                        =            1                    ,                      …            ⁢                                                  ⁢            N                                                          Accordingly, determining the excitation pattern as in step 104B in FIG. 2 may involve solving Equation (9) and Equation (10). As discussed above, the specific loudness pattern represents the neural excitations generated by hair cells, which convert basilar membrane vibrations at each point along its length (which is the excitation pattern) to electrical impulses. The specific loudness, or partial loudness is a measure of the perceived loudness per ERB, and is computed from the excitation pattern as per the Equation (11):S(k)=c((E(k)+A(k))α−Aα(k)) for k=1, . . . D  (11)where the constants are chosen as c=0.047 and α=0.2. It can be observed that the specific loudness pattern is derived through a non-linear compression of the excitation pattern. A(k) is a frequency dependent constant which is equal to twice the peak excitation pattern produced by a sinusoid at absolute threshold, which is denoted by ETHRQ (i.e., A(k)=2ETHRQ (k)). It can be inferred from this expression that the specific loudness is greater than zero for any sound, even if below the absolute threshold of hearing. Hence, the total loudness, which would be derived by integrating the specific loudness over the ERB scale, will also be positive for any sound. At frequencies greater than or equal to 500 Hz, the value of ETHRQ is constant. For frequencies lesser than 500 Hz, the cochlear gain is reduced, hence, increasing the excitation ETHRQ at the corresponding frequencies. This can be modeled as a gain g for each frequency, relative to the gain at 500 Hz and above (the gain at and above 500 Hz is constant), acting on the excitation pattern. It is assumed that the product of g and ETHRQ is constant. The specific loudness pattern is then expressed in Equation (12):S(k)=c((gE(k)+A(k))α−Aα(k)) for k=1, . . . D  (12)
The rate of decrease of specific loudness is higher when the stimulus is below absolute threshold than what is predicted in Equation (12). This is modeled by introducing an additional factor dependent on the excitation pattern strength. Hence, if E(k)<ETHRQ(k), Equation (13) holds for the specific loudness pattern:
                              S          ⁡                      (            k            )                          =                                            c              ⁡                              (                                                      E                    ⁡                                          (                      k                      )                                                                                                  E                      ⁡                                              (                        k                        )                                                              +                                                                  E                        THRQ                                            ⁡                                              (                        k                        )                                                                                            )                                      1.5                    ⁢                      (                                                            (                                                            gE                      ⁡                                              (                        k                        )                                                              +                                          A                      ⁡                                              (                        k                        )                                                                              )                                α                            -                                                A                  α                                ⁡                                  (                  k                  )                                                      )                                              (        13        )            
Similarly, when the intensity is higher than 100 dB, the rate of increase of specific loudness is higher, and is modeled by Equation (14), which is valid when E(k)>1010:
                              S          ⁡                      (            k            )                          =                              c            ⁡                          (                                                E                  ⁡                                      (                    k                    )                                                                    1.04                  ⨯                                      10                    6                                                              )                                0.5                                    (        14        )            
Hence, putting together Equations (12), (13) and (14), the specific loudness function can be expressed as in Equation (15), where the constant 1.04×106 is chosen to make S(k) continuous at E(k)=1010:
                              S          ⁡                      (            k            )                          =                  {                                                                                          c                    ⁡                                          (                                                                                                    (                                                                                          gE                                ⁡                                                                  (                                  k                                  )                                                                                            +                                                              A                                ⁡                                                                  (                                  k                                  )                                                                                                                      )                                                    α                                                -                                                                              A                            α                                                    ⁡                                                      (                            k                            )                                                                                              )                                                        ,                                                            E                      ⁡                                              (                        k                        )                                                              <                                                                  E                        THRQ                                            ⁡                                              (                        k                        )                                                                                                                                                                                                                                    c                        ⁡                                                  (                                                                                    E                              ⁡                                                              (                                k                                )                                                                                                                                                    E                                ⁡                                                                  (                                  k                                  )                                                                                            +                                                                                                E                                  THRQ                                                                ⁡                                                                  (                                  k                                  )                                                                                                                                              )                                                                    1.5                                        ⁢                                          (                                                                                                    (                                                                                          gE                                ⁡                                                                  (                                  k                                  )                                                                                            +                                                              A                                ⁡                                                                  (                                  k                                  )                                                                                                                      )                                                    α                                                -                                                                              A                            α                                                    ⁡                                                      (                            k                            )                                                                                              )                                                        ,                                                                                                                                                E                      THRQ                                        ⁡                                          (                      k                      )                                                        ≤                                      E                    ⁡                                          (                      k                      )                                                        ≤                                      10                    10                                                                                                                                                                  c                      ⁡                                              (                                                                              E                            ⁡                                                          (                              k                              )                                                                                                            1.04                            ⨯                                                          10                              6                                                                                                      )                                                              0.5                                    ,                                                            E                      ⁡                                              (                        k                        )                                                              >                                          10                      10                                                                                                                              (        15        )            Accordingly, determining the specific loudness from the excitation pattern as in step 106 of FIG. 1 may involve solving any of Equations (11)-(15).
The total loudness is computed by integrating the specific loudness pattern S(k) over the ERB scale, or computing the area under the loudness pattern. While implementing the model with a discrete number of detectors, the computation of the area under the specific loudness pattern can be performed by evaluating the area of trapezia formed by successive points on the pattern along with the x-axis (which is the ERB scale). The loudness can then be computed using Equation (16) and Equation (17):
                    L        =                              ∑                          k              =              1                                      D              -              1                                ⁢                      [                                                            S                  ⁡                                      (                    k                    )                                                  ⁢                                  δ                  d                                            +                                                1                  2                                ⁢                                  (                                                            S                      ⁡                                              (                                                  k                          +                          1                                                )                                                              -                                          S                      ⁡                                              (                        k                        )                                                                              )                                ⁢                                  δ                  d                                                      ]                                              (        16        )                                L        =                              δ            d                    ⁡                      [                                                            ∑                                      k                    =                    2                                                        D                    -                    1                                                  ⁢                                  S                  ⁡                                      (                    k                    )                                                              +                                                1                  2                                ⁢                                  (                                                            S                      ⁡                                              (                        1                        )                                                              +                                          S                      ⁡                                              (                        D                        )                                                                              )                                                      ]                                              (        17        )            Accordingly, determining the total loudness from the specific loudness as in step 108 of FIG. 1 may involve solving Equations (16) and (17). The loudness computed in this manner quantifies the loudness perceived when a stimulus is presented to one ear (the monaural loudness). The binaural loudness can be computed by summing the monaural loudness of each ear.
The measure of loudness derived above is also referred to as the instantaneous loudness, as it is the loudness for a short segment of an auditory stimulus. This measure of loudness is constant only when the input sound has a steady spectrum over time. Signals in reality are time-varying in nature. Such sounds exhibit temporal masking, which results in fluctuating values of the instantaneous loudness. Hence, it is important to derive metrics of loudness that are steadier for time-varying sounds.
Loudness estimation for time-varying sounds has been performed by suitably capturing variations in the signal power spectrum to account for the temporal masking. The power spectrum is computed over segments of the signals windowed with different lengths (e.g., 2, 4, 6, 8, 16, 32 and 64 milliseconds). Then, particular frequency components are selected from the obtained spectra to get the best trade-off time and frequency resolutions. The spectrum is updated every 1 ms, by shifting the windowing frame by 1 ms every time. The steady state spectrum hence derived is processed with the Moore-Glasberg method described above and the instantaneous loudness is computed.
The short-term loudness is calculated by averaging the instantaneous loudness using a one-pole averaging filter. The long-term loudness is calculated by further averaging the short-term loudness using another one-pole filter. The short-term loudness smoothes the fluctuations in the instantaneous loudness, and the long-term loudness reflects the memory of loudness over time. The filter time constants are different for rising and falling loudness. This models the non-linearity of accumulation of loudness perception over time. During an attack (i.e., a sudden increase in loudness), loudness rapidly accumulates, unlike reducing loudness, which is more gradual. If L(n) denotes the instantaneous loudness of the nth frame, then the short-term loudness Ls(n) at the nth frame is given by Equation (18) and Equation (19), where αa and αr are the attack and release parameters respectively:
                                          L            l                    ⁡                      (            n            )                          =                  {                                                                                                                ∝                      a                                        ⁢                                                                  L                        ⁡                                                  (                          n                          )                                                                    +                                                                        (                                                      1                            -                                                          α                              a                                                                                )                                                ⁢                                                                              L                            s                                                    ⁡                                                      (                                                          n                              -                              1                                                        )                                                                                                                                ,                                                            L                      ⁡                                              (                        n                        )                                                              >                                                                  L                        s                                            ⁡                                              (                                                  n                          -                          1                                                )                                                                                                                                                                                                                                    α                        r                                            ⁢                                              L                        ⁡                                                  (                          n                          )                                                                                      +                                                                  (                                                  1                          -                                                      α                            r                                                                          )                                            ⁢                                                                        L                          s                                                ⁡                                                  (                                                      n                            -                            1                                                    )                                                                                                      ,                                                            L                      ⁡                                              (                        n                        )                                                              ≤                                                                  L                        s                                            ⁡                                              (                                                  n                          -                          1                                                )                                                                                                                                                    (        18        )                                                      α            a                    =                      1            -                          e                              -                                                      T                    i                                                        T                    a                                                                                      ,                              α            r                    =                      1            -                          e                              -                                                      T                    i                                                        T                    r                                                                                                          (        19        )            where the value Ti denotes the time interval between successive frames, and Ta and Tr are the attack and release time constants respectively. Accordingly, determining the short-term loudness from the total loudness as in step 110 of FIG. 1 may involve solving Equations (18) and (19). Similarly, the long-term loudness Ll(n) can be computed from Equation (20):
                                          L            l                    ⁡                      (            n            )                          =                  {                                                                                                                                        ∝                        l                                            a                                        ⁢                                                                                            L                          s                                                ⁡                                                  (                          n                          )                                                                    +                                                                        (                                                      1                            -                                                          α                                                              l                                a                                                                                                              )                                                ⁢                                                                              L                            l                                                    ⁡                                                      (                                                          n                              -                              1                                                        )                                                                                                                                ,                                                                                    L                        s                                            ⁡                                              (                        n                        )                                                              >                                                                  L                        l                                            ⁡                                              (                                                  n                          -                          1                                                )                                                                                                                                                                                                                                    α                                                  l                          r                                                                    ⁢                                                                        L                          s                                                ⁡                                                  (                          n                          )                                                                                      +                                                                  (                                                  1                          -                                                      α                                                          l                              r                                                                                                      )                                            ⁢                                                                        L                          l                                                ⁡                                                  (                                                      n                            -                            1                                                    )                                                                                                      ,                                                                                    L                        s                                            ⁡                                              (                        n                        )                                                              ≤                                                                  L                        l                                            ⁡                                              (                                                  n                          -                          1                                                )                                                                                                                                                    (        20        )            Accordingly, determining the long-term loudness from the short-term loudness as in step 112 of FIG. 1 may involve solving Equation (20).
While the Moore-Glasberg method discussed above often provides a relatively accurate estimation of loudness, the complexity of the calculations discussed above require a significant amount of processing power. Given a frame of N samples of an input signal x(n), the computation of the N-point FFT, and hence, the power spectrum of the signal {Sx(ωi)}i=1N of the signal has a complexity of Θ(N log N), where N is size of the FFT. The effective power spectrum reaching the inner ear Sxc(ωi) is computed by filtering the spectrum Sx(ωi) through the outer-middle ear filter M(ωi). In the dB scale, this reduces to additions of the magnitudes of the signal power spectrum and the filter response, which has a complexity of Θ(N). The determination of the intensity pattern I(k) has a complexity of Θ(D), where D is the number of detectors. The subsequent computation of the auditory filter slopes pk also has a complexity of Θ(D). The computation of the auditory filter responses {W(k,i)}k=1,i=1D,N has a complexity of Θ(ND). Then, the auditory filter operates on the effective spectrum to determine the excitation pattern E(k), which also has a complexity Θ(ND). The computation of the specific loudness pattern S(k) from the excitation pattern has a complexity of Θ(D). The step of integrating the specific loudness pattern to estimate the total instantaneous loudness L also has a complexity of Θ(D). The final steps of computing the short-term and long-term loudness require a constant number of operations and hence, have a complexity of Θ(1).
It can be seen from the above analysis that the steps of computing the auditory filter responses and the filtering of the effective spectrum with the auditory filters has the highest complexity of Θ(ND). Accordingly, computing the excitation pattern according to conventional methods is computationally expensive. Several applications such as sinusoidal selection based analysis-synthesis, speech enhancement, bandwidth extension, and rate determination make use of auditory patterns. It is therefore beneficial to reduce the complexity of estimating excitation patterns and auditory patterns. Although there have been attempts to reduce the complexity of estimating excitation patterns and auditory patterns, such methods generally come at the expense of accuracy.
In an effort to reduce the computational load of the Moore-Glasberg method, approaches such as frequency pruning and detector pruning have been proposed. Frequency pruning involves reducing the number of frequency components in an auditory stimulus to approximate the spectrum with only a few components such that the total loudness is preserved. That is, one can choose to retain a subset of frequencies {fi}i=1N for computing the excitation pattern. In the other case, the set of detectors {dk}k=1D can be pruned to choose only a subset of detector locations for evaluating the excitation pattern {E(k)}k=1D. This approach is referred to as detector pruning, and is synonymous to non-uniformly sampling the excitation pattern along the basilar membrane to capture its shape.
Pruning the frequency components in the spectrum can be performed by using a quantity called the averaged intensity pattern. The average intensity pattern Y(k) is computed by filtering the intensity pattern, as show in equation (21), where the average intensity pattern is a measure of the average intensity per ERB:
                              Y          ⁡                      (            k            )                          =                              1            11                    ⁢                                    ∑                              i                =                                  -                  5                                            5                        ⁢                          I              ⁡                              (                                  k                  -                  i                                )                                                                        (        21        )            This allows the spectrum to be divided into tonal bands and non-tonal bands. Tonal bands are ERBs in which only a dominant spectral peak is present. The intensity pattern in these bands is quite flat, with a sudden drop at the edge of the ERB around the tone. The tonal bands can be represented by just the dominant tone, ignoring the remaining components. These tonal bands are identified as the locations of the maxima of the average intensity pattern Y(k), as shown in FIGS. 5A and 5B. Specifically, FIG. 5A shows an intensity pattern determined from an effective power spectrum of an auditory stimulus as discussed above and the average intensity pattern determined therefrom. FIG. 5B shows the effective power spectrum of the auditory stimulus and a number of tonal bands identified therein, which correspond to the maxima of the average intensity pattern shown in FIG. 5A.
The portions of the spectrum which do not qualify as tonal bands are labeled as non-tonal bands. Each non-tonal band is further divided into smaller bins B1:Q of width 0.25 ERB units (Cam), where Q is the number of sub-bands in the non-tonal band. Each sub-band Bp is assumed to be approximately white. From this assumption, each sub-band Bp is represented by a single frequency component Ŝp, which is equal to the total intensity within that band. If Mp is the indices of frequency components within Bp, then Ŝp is given by Equation (22):
                                          S            ^                    p                =                              ∑                          j              ∈                              M                p                                              ⁢                                    S              x              c                        ⁡                          (                              ω                j                            )                                                          (        22        )            This method of dividing the spectrum into smaller bands and representing each band with a single equivalent spectral component is justified, as it preserves the energy within each critical band and consequently, preserves the auditory filter shapes and their responses. Spectral bins smaller than 0.25 ERB may also be chosen for non-tonal bands, but it would result in less efficient frequency pruning.
The excitation at a detector location is the energy of the signal filtered by the bandpass filter at that detector location. Since the intensity pattern at a detector defined in Equation (4) is the energy within the bandwidth of the detector, the intensity pattern would have some correlation with the excitation pattern. This is illustrated by the plot shown in FIGS. 6A through 6C. It can be observed that for the given auditory stimulus in FIG. 6A, the shape of the excitation pattern in FIG. 6B is to a significant extent, dictated by the intensity pattern in FIG. 6C, wherein the peaks and valleys of the excitation pattern largely follow the peaks and valleys in the intensity pattern.
Detector pruning has conventionally been accomplished by choosing detectors from salient points based on the averaged intensity pattern. Accordingly, FIG. 7A shows an intensity pattern determined from an effective power spectrum of an auditory stimulus as discussed above and the average intensity pattern determined therefrom. The detectors at the locations of the peaks and valleys of the averaged intensity pattern are chosen for explicit computation. If the reference set of detectors is LT={dk∥dk−dk−1|=0.1, k=1, 2 . . . D}, then the pruning scheme produces a smaller subset of detectors
      L    e    =            {                                                  d              k                        ❘                                          ∂                                  Y                  ⁡                                      (                    k                    )                                                                              ∂                k                                              =          0                ,                  k          =          1                ,                  2          ⁢                                          ⁢          …          ⁢                                          ⁢          D                    }        .  The points on the excitation pattern are computed for the detectors in Le. The rest of the points in the excitation pattern are computed through linear interpolation.
FIG. 7B shows a reference excitation pattern corresponding with a full computation from the intensity pattern shown in FIG. 7A (as would be done according to the Moore-Glasberg model). Further, FIG. 7B shows a number of pruned detector locations obtained by choosing the locations of maxima and minima on the averaged intensity pattern, and the estimated excitation pattern, which is interpolated from the pruned detector locations. It can be seen that many detectors critical to accurately reproducing the original excitation pattern are not chosen. For the purposes of loudness estimation, the accumulation of errors during integration of specific loudness results in a significant error in the loudness estimate. Accordingly, detector pruning as discussed above may result in inaccurate loudness estimations.
FIG. 8 is a flow diagram illustrating the Moore-Glasberg method including frequency pruning and/or detector pruning to reduce the computational complexity thereof. The flow diagram shown in FIG. 8 is substantially similar to that shown above with respect to FIG. 1, except that in step 204, the determination of the excitation pattern is accomplished using frequency pruning and/or detector pruning. FIG. 9 shows details of step 204 when a frequency pruning approach is used. First, the intensity pattern is determined from the effective power spectrum (step 204A). An average intensity pattern is then determined from the intensity pattern (step 204B). The number of frequency components in the effective power spectrum are then reduced based on the average intensity pattern to obtain a frequency pruned power spectrum (step 204C). Specifically, the maxima of the average intensity pattern are used to identify tonal bands and non-tonal bands, which are then processed as described above to obtain the frequency pruned power spectrum. The excitation pattern is then determined from the frequency pruned power spectrum using a large number of equally spaced detector locations and interpolation (step 204D). Because the effective power spectrum must be processed at each one of the detector locations, reducing the complexity of the effective power spectrum by reducing the number of frequency components therein may reduce the complexity of the calculations for each one of the detector locations. However, due to the large number of detectors used in the conventional Moore-Glasberg approach, the computational complexity may still remain relatively high.
FIG. 10 shows details of step 204 when a detector pruning approach is used. First, the intensity pattern is determined from the effective power spectrum (step 204A). An average intensity pattern is then determined from the intensity pattern (step 204B). A set of pruned detector locations are then determined based on the average intensity pattern (step 204C). Specifically, the minima and maxima of the average intensity pattern define the set of pruned detector locations. The excitation pattern is then determined from the effective power spectrum using each one of the set of pruned detector locations (step 204D). Reducing the number of detector locations significantly reduces the computational complexity of the Moore-Glasberg method. However, such a reduction in complexity comes at the expense of accuracy, which may be severely reduced in some cases.
Accordingly, there is a present need for an auditory analysis technique with reduced complexity and high accuracy.