The present invention relates to a sound source separating apparatus, a sound source separating program, a sound pickup apparatus, and a sound pickup program, and can be applied to a sound source separating apparatus, a sound source separating program, a sound pickup apparatus, and a sound pickup program that separate and pick up a sound source only in a specific direction in an environment in which a plurality of sound sources are present, for example.
As a technique to separate and pick up a sound (hereinafter, things including a voice and a sound, for example, are expressed as a sound) only in a specific direction in an environment in which a plurality of sound sources are present, there is a beamformer (hereinafter also referred to as a BF) employing a microphone array. The beamformer is a technique to form directionality by use of a temporal difference between signals which reach respective microphones (see Futoshi Asano, “Acoustical Technology Series 16: Array signal processing for acoustics: localization, tracking and separation of sound sources, edited by the Acoustical Society of Japan, Corona Publishing Co., Ltd, Feb. 25, 2011). Beamformers are broadly classified into two kinds: an addition type and a subtraction type. In particular, the subtraction type BF has an advantage in that the subtraction type BF can form directionality with a smaller number of microphones than the addition type BF.
FIG. 2 is a block diagram showing a configuration of the subtraction type BF in which the number of microphones is two. In the subtraction type BF, first, a sound present in a target direction (hereinafter referred to as a target sound) reaches each of microphones 1 and 2, and a delayer 91 calculates a temporal difference between signals that have reached the microphones 1 and 2. Then, by adding a delay to a signal from any one of the microphones, a phase of the target sound is adjusted.
The temporal difference is calculated using the following formula (1). Here, d represents a distance between the microphones, c represents the sound speed, and τL represents a delay. Further, θL represents an angle between the target direction and a perpendicular direction with respect to a straight line connecting the microphones 1 and 2.τL=(d sin θL)/c  (1)
Here, in a case where a dead angle direction is present in the direction of the microphone 1 with respect to the intermediate point between the microphones 1 and 2, a delay process is performed on an input signal x1(t) of the microphone 1. Then, a subtracter 92 performs a process in accordance with a formula (2).α(t)=x2(t)−x1(t−τL)  (2)
The subtraction process can be performed similarly in a frequency region, in which case the formula (2) is changed as follows.A(ω)=X2(ω)−e−jωτLX1(ω)  (3)
Here, in a case where θL=±π/2, the formed directionality becomes a cardioid unidirectionality as shown in FIG. 3A, and in a case where θL=0 or π, the formed directionality becomes an eight-shaped bidirectionality as shown in FIG. 3B. Here, a filter that forms the unidirectionality from the input signal is referred to as a unidirectional filter and a filter that forms the bidirectionality is referred to as a bidirectional filter.
Further, by use of a spectral subtraction (hereinafter also referred to as an SS), a strong directionality can be formed in the dead angle direction of the bidirectionality. The directionality is formed by use of the SS in accordance with the following formula (4).|Y(ω)|=|X1(ω)|−β|A(ω)|  (4)
Although the input signal X1 of the microphone 1 is used in the formula (4), the same effects can be obtained by using an input signal X2 of the microphone 2. Here, β is a coefficient for adjusting the intensity of the SS. When the value becomes negative in subtraction, a flooring process is performed to replace the value by 0 or a value that is smaller than the original value. This technique makes it possible to emphasize the target sound by extracting a sound that is present in directions other than the target direction (hereinafter referred to as a non-target sound) through the bidirectional filter and by subtracting an amplitude spectrum of the extracted non-target sound from an amplitude spectrum of the input signal.