1. Field of the Invention
The present invention relates to a noise suppression technique for suppressing noise from an audio signal.
2. Description of the Related Art
A technique for suppressing unnecessary noise from an audio signal is important to enhance perceptual quality of a target sound included in an audio signal and to improve a recognition ratio in speech recognition.
As a representative technique for suppressing noise from an audio signal, a beamformer is known. The beamformer applies filtering to each of a plurality of microphone signals acquired by a plurality of microphones, and then adds up the filtered signals to obtain a single output signal. This technique is called “beamformer” because the filtering and addition processes correspond to formation of a spatial beam pattern having directivity, that is, direction selectivity by the plurality of microphones.
A portion where a gain of the beam pattern reaches a peak is called a main lobe, and when the beamformer is configured to be directed in a direction of a target sound, the target sound can be emphasized, and noise which exists in directions different from the target sound can be suppressed at the same time.
However, the main lobe of the beam pattern has a wide width especially when the number of microphones is small. A non-directional sound source having no directivity such as wind noise outdoors can be considered as a spatially omnidirectionally distributed noise source. For this reason, even when a moderate main lobe of the beam pattern is used, non-directional noise such as wind noise cannot be sufficiently suppressed.
Thus, a noise suppression method using a null as a portion where the gain of a beam pattern reaches a dip in place of the main lobe has been proposed.
FIG. 2A shows an example of a beam pattern in a horizontal direction at about 3.3 kHz on a polar coordinate system when the number of microphones is two. Assume that two microphones are disposed to be spaced apart from each other on a line segment which connects −90° and 90°. Note that beam patterns in semicircles in 0° and 180° directions with respect to the line segment are symmetrical patterns.
As can be seen from FIG. 2A, although a main lobe in a 90° direction has a very wide width, the gain of a null in a −30° direction is sharply declined, and only a sound in this direction is nearly not output. As a representative target sound included in microphone signals, a voice is known. A voice uttered by a person is a directional sound source which is spatially concentrated on one point. Thus, the following noise suppression method by means of two-step processes has been proposed (for example, Japanese Patent Laid-Open No. 2003-271191). That is, by directing the null of the beam pattern to a directional target sound, non-directional noise is extracted first, and then the extracted noise is subtracted from microphone signals.
In FIG. 2A, a non-directional noise source such as wind noise is expressed by marks “˜” as a spatially omnidirectionally distributed noise source. Also, a human voice as a directional target sound located in the −30° direction is expressed by a face mark. In this case, since a power per angle of the non-directional noise source is smaller than the human voice as the directional target sound, a beamformer is configured to minimize the output power, thus automatically forming the null in the −30° target sound direction. A beamformer which automatically forms the null of the beam pattern by a rule such as output power minimization is called an “adaptive beamformer”. The adaptive beamformer is suited to extraction of non-directional noise since the beam pattern, the null of which is directed in the target sound direction, as shown in FIG. 2A, can be automatically obtained.
However, the adaptive beamformer suffers the following problems.
For example, in case of wind noise, since a power per angle in a low-frequency range is very strong although wind noise is non-directional, the power per angle has a magnitude comparable to the directional target sound in the low-frequency range, as illustrated in FIG. 2B. FIG. 2B illustrates a beam pattern at about 470 Hz corresponding to a relatively low-frequency range of that of the adaptive beamformer formed with respect to a human voice under wind noise. At this frequency, since a power in the target sound direction is not specially larger than those in other direction, a null becomes very moderate compared to FIG. 2A at about 3.3 kHz corresponding to a mid-to-high frequency range. For this reason, since a target sound cannot be sufficiently removed, and is mixed in extracted noise, the target sound is reduced in the subsequent noise subtraction.
Contrary to the adaptive beamformer which automatically forms a null of a beam pattern, a beamformer which fixedly forms a null in a specific direction is called a “fixed beamformer”. Japanese Patent Laid-Open No. 2003-271191 discloses a method of selectively using the adaptive beamformer and fixed beamformer for respectively frequencies upon extraction of noise using the beamformer from microphone signals acquired by a microphone array.
However, the method of Japanese Patent Laid-Open No. 2003-271191 suffers the following problems.
As for the method of the adaptive beamformer, a method using a Jim-Griffith adaptive beamformer is disclosed. This method is based on the output power minimization rule, and a null of a beam pattern is automatically formed. However, a direction of a main lobe has to be designated as a constraint for setting a filter coefficient vector of the beamformer as a non-zero vector. However, in non-directional noise extraction, since only a null to be directed to a directional target sound is originally required, if the direction of the main lobe is explicitly designated, it may influence the beam pattern, thus lowering a target sound suppression performance.
Also, as for the fixed beamformer, a method based on simple differences between channels of microphone signals is disclosed. However, with this method, a null is formed in a direction of a perpendicular bisector of a line segment which connects microphones, and is not directed in the target sound direction. Hence, a target sound is mixed in extracted noise at a high possibility.
Furthermore, as for the selection method of the adaptive beamformer and fixed beamformer, a method of selecting a beamformer having a smaller output power for each frequency range is disclosed. However, as described above, the null of the fixed beamformer is not always directed to the target sound direction, and only an output power is checked. Hence, this selection method is not always suitable to remove a target sound and to extract only noise.