Spaced pressure microphone arrays allow the design of spatial filters that can focus on one specific direction while suppressing noise or interfering sources from other directions, which can be also referred as beamforming. The most basic beamforming approaches are the conventional delay and sum and the filter and sum. Delay and sum beamformer algorithm estimates the time delays of signals received by each microphone of an array and compensates for the time difference of arrival [5]. Narrow directivity patterns can be obtained, but this requires a large spacing between the microphones and a large number of microphones. An even frequency response for all audible frequencies can be created by using the filter and sum technique.
In international patent application published under publication number WO 2007/106399 A2, a directional microphone array having at least two microphones generates forward and backward cardioid signals from two omnidirectional microphone signals. An adaptation factor is applied to the backward cardioid signal, and the resulting adjusted backward cardioid signal is subtracted from the forward cardioid signal to generate a first-order output audio signal corresponding to a beam pattern having no nulls for negative values of the adaptation factor. After low-pass filtering, it is proposed to apply spatial noise suppression to the output audio signal. Time-variant methods have been proposed to combine the microphones optimally to minimize the level of unwanted sources while retaining the signal arriving from the desired direction. One of the most well known techniques in adaptive beamforming is the Minimum Variance Distortionless Response (MVDR), based on minimizing the power of the output while preserving the signal from the look direction by employing a set of weights and placing nulls at the directions of the interferes [6]. Such beamformers require still relatively high number of microphones in a spatial arrangement with considerable dimensions.
A closely-spaced microphone array technique can also be used for beamforming, where microphone patterns of different orders are derived [7]. In that technique, the microphones are summed together in same or opposite phase with different gains and frequency equalization, where typically microphone signals having directivity patterns following the spherical harmonics of different orders are targeted. Unfortunately, typically the response has tolerable quality only in a limited frequency window; at low frequencies the system suffers from amplification of the self noise of microphones and at high frequencies the directivity patterns are deformed.
These beamforming techniques do not assume anything about the signals of the sources. Recently some techniques have been proposed, which assume that the signals arriving from different directions to the microphone array are sparse in time-frequency domain, i.e., one of the sources is dominant at one time-frequency position [19]. Each time-frequency frame is then attenuated or amplified according to spatial parameters analyzed for corresponding time-frequency position, which essentially assembles the beam. It is clear that such methods may produce distortion to the output, however, the assumption is that the distortion is most prominent with weakest time-frequency slots of the signals making the artifact inaudible or at least tolerable.
In such techniques a microphone array consisting of two cardioid capsules facing opposite directions has been proposed in [15] and [16]. Correlation measures are used between the cardioid capsules and Wiener filtering is used to reduce the level of coherent sound in one of the microphone signals. This produces a directive microphone signal, whose beam width can be controlled. An inherent result is that the width varies depending on the sound field. For example, with few speech sources in relatively anechoic conditions prominent narrowing of the cardioid pattern is obtained. However, with many uncorrelated sources, and in diffuse field, the method does not change the directivity pattern of the cardioid microphone at all. The method is still advantageous, as the number of microphones is low, and the setup does not require large spatial arrangement.
The assumption of the sparsity of the source signals is also utilized in another technique, Directivity Audio Coding (DirAC) [11], which is a method to capture, process and reproduce spatial sound over different reproduction setups. The most prominent direction-of-arrival (DOA) and the diffuseness of sound field are computed or measured as spatial parameters for each time-frequency position of sound. DOA is estimated as the opposite direction of the intensity vector, and the diffuseness is estimated by comparing the magnitude of the intensity vector with total energy. In the original version of DirAC the parameters are utilized in reproduction to enhance audio quality. A variant of DirAC has been used for beamforming [12], where each time-frequency position of sound is gained or attenuated depending on the spatial parameters and a specified spatial filter pattern. In practice, if the DOA of a time-frequency position is far from the desired direction, it is attenuated. Additionally, if the diffuseness is high, the attenuation is made milder as the DOA is considered to be less certain. However, in cases when two sources are active in the same time-frequency position, the analyzed DOA provides erroneous data, and artifacts may occur. In Simeon Delikaris-Manias, Simulations of second order microphones in audio coding, 1 Jan. 2012, pages 1 to 6, XP055104330, as retrieved from the Internet under http://hal.archivesouvertes.fr/docs/00/61/67/63/PDF/report.pdf, a theoretical model for comparing higher order with first order inputs in DirAC analysis has been presented. In the theoretical model, the proposed gain is obtained by computing cross-correlation between two signals normalized with a normalization coefficient. The calculated virtual microphones that contain the signal information may be filtered through DirAC gain and the gain proposed.