Conventional noise removal methods include methods for determining time-point portions of frequency spectra of detected audios with small noise influence from among frequency spectra of acoustic signals containing the detected audios and noises (for example, see Non-patent Reference 1).
Such time-point portions of the frequency spectra of the detected audios with small noise influence are determined based on time portions each having an SN ratio equal to or greater than 0 dB. An SN ratio is a power ratio in a frequency spectrum of a sound (S) and noises (N) (the ratio of the power of the frequency spectrum of the sound (S) and the power of the frequency spectrum of the noises (N)). Here, the power of the frequency spectrum of the noises is calculated based on a time segment not containing the detected audio, and then the power of the frequency spectrum of the detected audio is calculated by subtracting the power of the frequency spectrum of the noises from the frequency spectrum in which the detected audio and noises are mixed. As a post-processing, recognition of the detected audio (sound) is performed. In addition, some of the other methods for determining time-point portions of frequency spectra of detected audios include a method for determining time portions of the detected audios by calculating, based on data for learning, a probability distribution of detected audios inputted and a probability distribution of noises inputted, and then conducting Bayes Estimation. Here, the probability distributions use variations such as an SN ratio which is a power ratio of a frequency spectrum of a sound (S) and a frequency spectrum of noises (N), and waveform information of frequency spectra of the detected audio and noises. In this way, it is possible to accurately determine time-point portions of frequency spectra of detected audios, based on SN ratios and other information.
Conventional audio source direction detecting methods include a method for calculating an audio source direction by: segmenting each of acoustic signals received by a corresponding one of first and second microphones arranged at an interval into signals having different frequency bands (obtaining the frequency spectrum of each of the segmented signals); calculating a difference between the arrival time of the acoustic signal at the first microphone and the arrival time of the acoustic signal at the second microphone, based on the cross-correlations (the degrees of similarity) of the signals each received by a corresponding one of the first and second microphones and segmented into different frequency bands; and calculating an audio source direction based on the arrival time differences and the distance between the microphones (for example, see Patent Reference 1).    [Non-patent Reference 1] “Missing-Feature Approaches in Speech Recognition”, Bhiksha Raj and Richard M. Stern, IEEE SIGNAL PROCESSING MAGAZINE, pp. 101-pp. 116, 2005    [Patent Reference 1] Japanese Unexamined Patent Application Publication 2002-62348 (Claim 1, FIG. 1)