The MPEG-4 parametric audio coding tools ‘Harmonic and Individual Lines plus Noise’ (HILN) permit coding of general audio signals at bit-rates of 4 kbps and above using a parametric representation of the audio signals (please see Heiko Purnhagen, HILN-The MPEG-4 Parametric Audio Coding Tools, IEEE International Conference on Circuits and Systems, May 2000 and Heiko Purnhagen, Advances in Parametric Audio Coding, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October 1999). FIG. 1 shows a block diagram of a HILN parametric audio encoder. The input signal is first decomposed into different components and then the model parameters for the components' source models are estimated such that:                An individual sinusoid is described by its frequency and amplitude.        A harmonic tone is described by its fundamental frequency, amplitude and the spectral envelope of its partial harmonics.        A noise_signal is described by its amplitude and spectral envelope.        
Due to the low target bit rates (e.g. 6-16 kbps), only the parameters for a small number of components can be transmitted. Therefore a perception model is employed to select those components that are most important for the perceptual quality of the signal. The quantization of the selected components is also done using the perceptual importance criteria.
A slightly different approach was adapted by Goodwin (M. Goodwin, Adaptive Signal Models: Theory, Algorithm and Audio Applications, PhD thesis, University of California, Berkeley, 1997) for the atomic decomposition of audio signals. Consider an additive signal model of the form:
      x    ⁡          [      n      ]        =            ∑              i        =        1            I        ⁢                  ⁢                  a        i            ⁢                        g          i                ⁡                  [          n          ]                    wherein a signal is represented as a weighted sum of basic components (gi[n]). These building blocks or basic components are picked from an existing dictionary of many such components. Being over-complete, it is possible to represent the same signal with non-identical sets of basic components. The preferred representation set chosen will be the one in which there are the fewest number of basic components. This is the concept of compact representation, and is the theme behind most advanced signal representation techniques such as wavelets. The traditional transform coders that use a set of complex exponentials (analogous to words in the dictionary) as the basis for encoding input signals are complete. Therefore there is only one possible representation of enclosed signal because there is a unique Fourier Transform for a given signal. In the over-complete case, more than one representation is possible, and an efficient coding scheme attempts to determine which is most compact.
Sinusoidal modeling is suited best for stationary tonal signals. Transient signals (such as beats) can be modeled well only by using a large number of such sinusoids with the original phase preserved, as presented by Pumhagen in Advances in Parametric Audio Coding. This is certainly not a compact representation of transient signals.
Goodwin [M. Goodwin, Matching Pursuit with Damped Sinusoids, IEEE International Conference on Acoustics, Speech and Signal Processing, 1997] recommended the scheme of damped sinusoids to model transients. However, his approach of matching pursuit is relatively computationally expensive. It is desired to provide a simpler approach that produces good results.
Moreover, the general thinking seems to be that the decay in the transient signal is modeled as a single exponential. FIG. 2 shows, however, that the envelope generated by the single exponential has significant error relative to the true envelope. Accordingly, the single exponential model is not desirably accurate. For a small increase in the number of parameters, it is possible to be more accurate about the exact nature of the decay function.