Various means exist for recording and then reproducing a sound field using microphones and loudspeakers (or headphones). The focus of this disclosure is accurate sound field reconstruction and/or reproduction compared with artistic sound field reproduction where creative modifications are allowed. Currently, there are two primary and state-of-the-art techniques used for accurately recording and reproducing a sound field: higher order ambisonics (HOA) and wave-field synthesis (WFS). The WFS technique generally requires a spot microphone for each sound source. In addition, the location of each sound source must be determined and recorded. The recording from each spot microphone is then rendered using the mathematical machinery of WFS. Sometimes spot microphones are not available for each sound source or spot microphones may not be convenient to use. In such cases, one generally uses a more compact microphone array such as a linear, circular, or spherical array. Currently, the best available technique for reconstructing a sound field from a compact microphone array is HOA. However, HOA suffers from two major problems: (1) a small sweet spot and (2) degradation in the reconstruction when the mathematical system is under-constrained (for example, when too many loudspeakers are used). The small sweet spot phenomenon refers to the fact that the sound field is only accurate for a small region of space.
Several terms relating to this disclosure are defined below.
“Reconstructing a sound field” refers, in addition to reproducing a recorded sound field, to using a set of analysis plane-wave directions to determine a set of plane-wave source signals and their associated source directions. Typically, analysis is done in association with a dense set of plane-wave source directions to obtain a vector, g, of plane-wave source signals in which each entry of g is clearly matched to an associated source direction.
“Head-related transfer functions” (HRTFs) or “Head-related impulse responses” (HRIRs) refer to transfer functions that mathematically specify the directional acoustic properties of the human auditory periphery including the outer ear, head, shoulders, and torso as a linear system. HRTFs express the transfer functions in the frequency domain and HRIRs express the transfer functions in the time domain.
“HOA-domain” and “HOA-domain Fourier Expansion” refer to any mathematical basis set that may be used for analysis and synthesis for Higher Order Ambisonics such as the Fourier-Bessel system, circular harmonics, and so forth. Signals can be expressed in terms of their components based on their expansion in the HOA-domain mathematical basis set. When signals are expressed in terms of these components, it is said that the signals are expressed in the “HOA-domain”. Signals in the HOA-domain can be represented in both the frequency and time domain in a manner similar to other signals.
“HOA” refers to Higher Order Ambisonics which is a general term encompassing sound field representation and manipulation in the HOA-domain.
“Compressive Sampling” or “Compressed Sensing” or “Compressive Sensing” all refer to a set of techniques that analyse signals in a sparse domain (defined below).
“Sparsity Domain” or “Sparse Domain” is a compressive sampling term that refers to the fact that a vector of sampled observations y can be written as a matrix-vector product, e.g., as:y=Ψx where Ψ is a basis of elementary functions and nearly all coefficient in x are null. If S coefficients in x are non-null, we say the observed phenomenon is S-sparse in the sparsity domain Ψ.
The function “pinv” refers to a pseudo-inverse, a regularised pseudo-inverse or a Moore-Penrose inverse of a matrix.
The L1-norm of a vector x is denoted ∥x∥1 and is given by
                  x              1    =            ∑      i        ⁢                                    x          i                            .      
The L2-norm of a vector x is denoted by ∥x∥2 and is given by
                  x              2    =                              ∑          i                ⁢                                                        x              i                                            2                      .  
The L1-L2 norm of a matrix A is denoted by ∥A∥1-2 and is given by:∥A∥1-2=∥u∥1,where
            u      ⁡              [        i        ]              =                            ∑          j                ⁢                                                        A              ⁡                              [                                  i                  ,                  j                                ]                                                          2                      ,      u    ⁡          [      i      ]      is the i-th element of u, and A[i, j] is the element in the i-th row and j-th column of A.
“ICA” is Independent Component Analysis which is a mathematical method that provides, for example, a means to estimate a mixing matrix and an unmixing matrix for a given set of mixed signals. It also provides a set of separated source signals for the set of mixed signals.
The “sparsity” of a recorded sound field provides a measure of the extent to which a small number of sources dominate the sound field.
“Dominant components” of a vector or matrix refer to components of the vector or matrix that are much larger in relative value than some of the other components. For example, for a vector x, we can measure the relative value of component xi compared to xj by computing the ratio
      x    i        x    j  or the logarithm or the ratio, log
      (                  x        i                    x        j              )    .If the ratio or log-ratio exceeds some particular threshold value, say θth, xi may be considered a dominant component compared to xj.
“Cleaning a vector or matrix” refers to searching for dominant components (as defined above) in the vector or matrix and then modifying the vector or matrix by removing or setting to zero some of its components which are not dominant components.
“Reducing a matrix M” refers to an operation that may remove columns of M that contain all zeros and/or an operation that may remove columns that do not have a Dominant Component. Instead, “Reducing a matrix M” may refer to removing columns of the matrix M depending on some vector x. In this case, the columns of the matrix M that do not correspond to Dominant Components of the vector x are removed. Still further, “Reducing a matrix M” may refer to removing columns of the matrix M depending on some other matrix N. In this case, the columns of the matrix M must correspond somehow to the columns or rows of the matrix N. When there is this correspondence, “Reducing the matrix M” refers to removing the columns of the matrix M that correspond to columns or rows of the matrix N which do not have a Dominant Component.
“Expanding a matrix M” refers to an operation that may insert into the matrix M a set of columns that contains all zeros. An example of when such an operation may be required is when the columns of matrix M correspond to a smaller set of basis functions and it is required to express the matrix M in a manner that is suited to a larger set of basis functions.
“Expanding a vector of time signals x(t)” refers to an operation that may insert into the vector of time signals x(t), signals that contain all zeros. An example of when such an operation may be required is when the entries of x(t) correspond to time signals that match a smaller set of basis functions and it is required to express the vector of time signals x(t) in a manner that is suited to a larger set of basis functions.
“FFT” means a Fast Fourier Transform.
“IFFT” means an Inverse Fast Fourier Transform.
A “baffled spherical microphone array” refers to a spherical array of microphones which are mounted on a rigid baffle, such as a solid sphere. This is in contrast to an open spherical array of microphones which does not have a baffle.
Several notations related to this disclosure are described below:
Time domain and frequency domain vectors are sometimes expressed using the following notation: A vector of time domain signals is written as x(t). In the frequency domain, this vector is written as x. In other words, x is the FFT of x(t). To avoid confusion with this notation, all vectors of time signals are explicitly written out as x(t).
Matrices and vectors are expressed using bold-type. Matrices are expressed using capital letters in bold-type and vectors are expressed using lower-case letters in bold-type.
A matrix of filters is expressed using a capital letter with bold-type and with an explicit time component such as M(t) when expressed in the time domain or with an explicit frequency component such as M(ω) when expressed in the frequency domain. For the remainder of this definition we assume that the matrix of filters is expressed in the time domain. Each entry of the matrix is then itself a finite impulse response filter. The column index of the matrix M(t) is an index that corresponds to the index of some vector of time signals that is to be filtered by the matrix. The row index of the matrix M(t) corresponds to the index of the group of output signals. As a matrix of filters operates on a vector of time signals, the “multiplication operator” is the convolution operator described in more detail below.
“” is a mathematical operator which denotes convolution. It may be used to express convolution of a matrix of filters (represented as a general matrix) with a vector of time signals. For example, y(t)=M(t)x(t) represents the convolution of the matrix of filters M(t) with the corresponding vector of time signals in x(t). Each entry of M(t) is a filter and the entries running along each column of M(t) correspond to the time signals contained in the vector of time signals x(t). The filters running along each row of M(t) correspond to the different time signals in the vector of output signals y(t). As a concrete example, x(t) may correspond to a set of microphone signals, while y(t) may correspond to a set of HOA-domain time signals. In this case, the equation y(t)=M(t)x(t) indicates that the microphone signals are filtered with a set of filters given by each row of M(t) and then added together to give a time signal corresponding to one of the HOA-domain component signals in y(t).
Flow charts of signal processing operations are expressed using numbers to indicate a particular step number and letters to indicate one of several different operational paths. Thus, for example, Step 1.A.2.B.1 indicates that in the first step, there is an alternative operational path A, which has a second step, which has an alternative operational path B, which has a first step.