Several audio-signal processing techniques that reduce noise components included in a recorded sound signal obtained by recording a sound of a speaker by a microphone, etc., have been known. For example, Japanese Unexamined Patent Application Publication Nos. 10-003297, 2007-318528, 2004-341339, and 2000-172283 are some examples.
First, as a first technique, there is a technique in which an output signal having a different noise elimination characteristic is selected on the basis of whether a signal component of a human voice included in an input audible signal is a voiced sound or an unvoiced sound. By the first technique, it is possible to eliminate background noise. Also, in the first technique, a short-time average and a long-time average are calculated on the time axis of the input audible signal. And in the first technique, if a difference between the calculated short-time average and long-time average is greater than a first threshold value, it is determined that the audible signal includes a voice component. Alternatively, in the first technique, whether a voice component is included in an input audible signal or not is determined on the basis of a comparison result between a signal-to-noise ratio of the input audible signal and the first threshold value. Also, in the first technique, whether a voice component included in an input audible signal is a voiced sound or an unvoiced sound is determined by a magnitude relationship between a signal-to-noise ratio of the input audible signal and a second threshold value, and a magnitude relationship between a power ratio of a maximum value on the frequency axis of the input audible signal to an estimated background noise and a third threshold value.
Also, as a second technique, a technique in which an audio signal originated from a sound source in a certain direction is emphasized and surrounding noise is suppressed is known. In the second technique, when an audio signal including voices, noise, etc., originated from sound sources existing in a plurality of directions are input using a plurality of microphones, processing for determining whether the audio signal is coming from a direction of a speaker or not is performed on the basis of phase differences among the microphones for each frequency.
Also, as a third technique, spectral shapes of audio signals divided into a plurality of frequency bands are analyzed for each frequency, and are grouped into voices, noise, or voice-like noise. And in the third technique, a technique, in which best-suited noise suppression processing selected in accordance with the group is performed for each band, is also known.
In this regard, as another technique, a technique of determining whether it is a state of including a voice signal or a state of not including a voice signal in order to perform efficient audio coding is known. For example, an element value to be a basis of determination of whether a frame-divided voice signal is included or not is calculated for each section further divided into a shorter section than that frame, which is a processing unit of audio coding processing. And in this technique, it is known that the above-described determination is made on the basis of a size of the calculated value and degrees of change.