1. Field of the Invention
Embodiments of the present invention are related to audio processing, and more particularly to the analysis of audio signals.
2. Related Art
There are numerous solutions for splitting an audio signal into sub-bands and deriving frequency-dependent amplitude and phase characteristics varying over time. Examples include windowed fast Fourier transform/inverse fast Fourier transform (FFT/IFFT) systems as well as parallel banks of finite impulse response (FIR) and infinite impulse response (IIR) filter banks. These conventional solutions, however, all suffer from deficiencies.
Disadvantageously, windowed FFT systems only provide a single, fixed bandwidth for each frequency band. Typically, a bandwidth which is applied from low frequency to high frequency is chosen with a fine resolution at the bottom. For example, at 100 Hz, a filter (bank) with a 50 kHz bandwidth is desired. This means, however, that at 8 kHz, a 50 Hz bandwidth is used where a wider bandwidth such as 400 Hz may be more appropriate. Therefore, flexibility to match human perception cannot be provided by these systems.
Another disadvantage of windowed FFT systems is that inadequate fine frequency resolution of sparsely sampled windowed FFT systems at high frequencies can result in objectionable artifacts (e.g., “musical noise”) if modifications are applied, (e.g., for noise suppression.) The number of artifacts can be reduced to some extent by dramatically reducing the number of samples of overlap between the windowed frames size “FFT hop size” (i.e., increasing oversampling.) Unfortunately, computational costs of FFT systems increase as oversampling increases. Similarly, the FIR subclass of filter banks are also computationally expensive due to the convolution of the sampled impulse responses in each sub-band which can result in high latency. For example, a system with a window of 256 samples will require 256 multiplies and a latency of 128 samples, if the window is symmetric.
The IIR subclass is computationally less expensive due to its recursive nature, but implementations employing only real-valued filter coefficients present difficulties in achieving near-perfect reconstruction, especially if the sub-band signals are modified. Further, phase and amplitude compensation as well as time-alignment for each sub-band is required in order to produce a flat frequency response at the output. The phase compensation is difficult to perform with real-valued signals, since they are missing the quadrature component for straight-forward computation of amplitude and phase with fine time-resolution. The most common way to determine amplitude and frequency is to apply a Hilbert transform on each stage output. But an extra computation step is required for calculating the Hilbert transform in real-valued filter banks, and is computationally expensive.
Therefore, there is a need for systems and methods for analyzing and reconstructing an audio signal that is computationally less expensive than existing systems, while providing low end-to-end latency, and the necessary degrees of freedom for time-frequency resolution.