1. Field of the Invention
The present invention relates to a sound processing device and a sound processing method.
2. Description of Related Art
It is known that a speech recognition rate is lowered when speech is recognized in a noisy environment. Therefore, it has been proposed that sound signals of multiple channels are recorded, a speech and noise included in the recorded sound signals are separated from each other, and the speech separated from the noise is recognized. A sound source separating technique of estimating directions of sound sources and separating sound signals for the sound sources using directional filters having high sensitivity in the estimated directions is known as the process of separating sound sources.
For example, in a sound signal processing device disclosed in Japanese Unexamined Patent Application, First Publication No. 2012-234150, a direction and a section of a target sound are estimated based on sound signals of multiple channels acquired from multiple microphones disposed at different positions and a sound signal of a predetermined target sound is extracted from the estimated direction and section. Specifically, observation signals in the time and frequency domains are generated from the sound signals of multiple channels, and a direction of a target sound and a section in which the target sound appears are detected based on the observation signals. A reference signal corresponding to a time envelope indicating a sound volume variation in the time direction of the target sound is generated based on the detected direction and section of the target sound, a covariance matrix is calculated from the reference signal and the observation signals, and an extraction filter for extracting a sound signal of the target sound is generated from eigenvectors of the calculated covariance matrix.
However, in the sound signal processing device disclosed in Japanese Unexamined Patent Application, First Publication No. 2012-234150, since the sound source directions of the sound signals of multiple channels are estimated regardless of the number of sound sources uttering sound and sound signals of the sound sources are separated from the sound signals of multiple channels, the calculation cost is very high and the processing time is long. The number of sound sources simultaneously uttering sound may vary and thus the estimation accuracy of the sound source directions is lowered. In addition, since the degree of separation of the sound sources is not perfect, a speech recognition rate is lowered.