1. Field
This disclosure relates to audio signal processing and, in particular, to methods for decomposing audio signals into direct and diffuse components.
2. Description of the Related Art
Audio signals commonly consist of a mixture of sound components with varying spatial characteristics. For a simple example, the sounds produced by a solo musician on a stage may be captured by a plurality of microphones. Each microphone captures a direct sound component that travels directly from the musician to the microphone, as well as other sound components including reverberation of the sound produced by the musician, audience noise, and other background sounds emanating from an extended or diffuse source. The signal produced by each microphone may be considered to contain a direct component and a diffuse component.
In many audio signal processing applications it is beneficial to separate a signal into distinct spatial components such that each component can be analyzed and processed independently. In particular, separating an arbitrary audio signal into direct and diffuse components is a common task. For example, spatial format conversion algorithms may process direct and diffuse components independently so that direct components remain highly localizable while diffuse components preserve a desired sense of envelopment. Also, binaural rendering methods may apply independent processing to direct and diffuse components where direct components are rendered as virtual point sources and diffuse components are rendered as a diffuse sound field. In this patent, separating a signal into direct and diffuse components will be referred to as “direct-diffuse decomposition”.
The terminology used in this patent may differ slightly from terminology employed in the related literature. In related papers, direct and diffuse components are commonly referred to as primary and ambient components or as nondiffuse and diffuse components. This patent uses the terms “direct” and “diffuse” to emphasize the distinct spatial characteristics of direct and diffuse components; that is, direct components generally consist of highly directional sound events and diffuse components generally consist of spatially distributed sound events. Additionally, in this patent, the terms “correlation” and “correlation coefficient” refer to a normalized cross-correlation measure between two signals evaluated with a time-lag of zero.
Throughout this description, elements appearing in figures are assigned three-digit reference designators, where the most significant digit is the figure number where the element is introduced and the two least significant digits are specific to the element. An element that is not described in conjunction with a figure may be presumed to have the same characteristics and function as a previously-described element having the same reference designator.