Field of the Invention
The present invention relates to signal processing apparatuses that perform audio processing, and signal processing methods.
Description of the Related Art
A technique of removing unnecessary noise from an audio signal is important for improving audibility to target sound included in the audio signal and increasing the recognition rate in speech recognition. Representative techniques of removing noise in an audio signal include a beamformer. This is for adding microphone signals of a plurality of channels acquired by a plurality of microphone elements after filtering each microphone signal, and obtaining a single output signal. The aforementioned filtering and addition processing corresponds to formation of a spatial beam pattern having a directivity, i.e., a direction-selectivity characteristic, using a plurality of microphone elements, and is therefore called the beamformer.
A portion at which sensitivity (gain) of a beam pattern reaches its peak is called a main lobe, and it is possible to emphasize target sound and simultaneously suppress noise existing in a direction different from the direction of the target sound by configuring the beamformer such that the main lobe is oriented to the direction of the target sound. However, the main lobe of a beam pattern forms a gentle curve having a wide width particularly in the case where the number of microphone elements is small. For this reason, even if such a main lobe of a beam pattern is oriented to the direction of the target sound, noise that is close to the target sound cannot be sufficiently removed.
In this regard, a noise removal method using not the main lobe but a null (dead angle), which is a portion at which the sensitivity of a beam pattern reaches its dip, has been proposed. That is to say, only noise can be sufficiently removed by orienting a sharp null to the direction of noise, without losing target sound whose direction is close to the noise direction. A beamformer that thus forms a null in a specific direction in a fixed manner is called a fixed beamformer. Here, if the direction to which the null is oriented is not accurate, noise removing performance significantly deteriorates, and accordingly estimation of the direction of a sound source is important.
In contrast with the fixed beamformer, a beamformer by which the null of a beam pattern is automatically formed is called an adaptive beamformer, and the adaptive beamformer can be used to estimate the sound source direction. Considering target sound and noise as directional sound sources whose power spatially concentrates on one point, a filter coefficient with which the null is automatically formed in the sound source direction can be obtained using the adaptive beamformer that is based on a rule that minimizes output power. Accordingly, in order to find the sound source direction, a beam pattern formed by a filter coefficient of the adaptive beamformer is calculated, and the null direction thereof need only be obtained. The beam pattern can be calculated by multiplying a filter coefficient by a transfer function called an array manifold vector between a sound source in each direction and each microphone element. For example, the angle of the direction in which the filter coefficient has a null that is a dip of the sensitivity is checked using array manifold vectors in −180° to 180° directions at 1° intervals.
Here, in sound source separation such as that performed using the beamformer, in general, an array manifold vector using a theoretical formula in a free field is often used, assuming that a microphone is arranged in a free field. Sound ideally propagates in a free field where there is no obstruction, and accordingly, for example, a difference in propagation delay time between microphone elements, i.e., a phase difference at each frequency between array manifold vector elements is geometrically obtained by a theoretical formula with a microphone interval as a parameter. In contrast, in the case where a microphone is arranged not in a free field but in the vicinity of a housing or therewithin, diffraction, blocking, scattering, or the like of sound occurs due to the housing, and accordingly the aforementioned phase difference diverges from the theoretical value in a free field. Furthermore, a difference in signal amplitude between microphone elements in each sound source direction is also affected by the housing in which the microphone elements are arranged.
Since the amplitude difference and the phase difference between the microphone elements significantly change due to the influence of the housing in which the microphones are arranged as mentioned above, the array manifold vector, which is a transfer function between a sound source in each direction and each microphone element, also changes due to the influence of the housing. If the array manifold vector used to calculate a beam pattern does not follow such a change, the sound source direction cannot be accurately estimated. Japanese Patent Laid-Open No. 2011-199474 (hereinafter, Document 1) describes estimation of an array manifold vector that contains the influence of a housing, using independent component analysis. Japanese Patent Laid-Open No. 2010-278918 (hereinafter, Document 2) describes sequentially obtaining microphone position coordinates that change in accordance with an open/close state of a housing movable portion and using the microphone position coordinates as parameters in sound source separation processing, in the case where a microphone is attached to the housing movable portion of a foldable mobile phone or the like.
However, there are cases where the accuracy of the sound source estimation cannot yet be maintained with the methods described in Documents 1 and 2. With the method in Document 1, for example, in the case of using a built-in microphone in a camcorder, it is conceivable that an array manifold vector which contains the influence of the housing of the camcorder can be estimated and used. However, in the case of switching the microphone used to obtain the audio signal from the built-in microphone to an external microphone, the external microphone is separate from the camcorder and is therefore not easily affected by the housing of the camcorder. That is to say, the array manifold vector significantly changes between the built-in microphone and the external microphone. In Document 1, selection of the array manifold vector while assuming such a case where the microphone is switched is not at all considered.
Regarding the method in Document 2, since the microphone position coordinates are parameters in the sound source separation processing, it is conceivable that a free field is assumed. However, in actual audio processing in a camcorder or the like, the array manifold vector used in the audio processing is affected by diffraction or the like caused by a housing. Furthermore, even if the microphone position coordinates do not change, if the shape of the housing changes due to interchange or zooming of a lens of the camcorder, for example, it is conceivable that the array manifold vector also changes accordingly. However, in Document 2, selection of the array manifold vector while taking such influence of a change of the housing shape on diffraction or the like into account is not considered.