Demands for separating and extracting a specific signal component from a given input signal having a plurality of mixed signal components are encountered in a variety of scenes in daily life. An example of such scenes is recognition of conversation or desired voice in a noisy environment. In such a scene, conversation and/or desired voice are generally captured using an electroacoustic transducer element, such as a microphone, at a point in space. The captured conversation and/or desired voice are converted into an electric signal, and manipulated as an input signal.
One conventionally known system applied to an input signal containing a plurality of signal components comprising desired voice and background noise is a noise suppression system (which will be referred to as a noise suppressor hereinbelow), which enhances the desired voice by suppressing the background noise. The noise suppressor is a system for suppressing noise superposed over a desired acoustic signal. In general, the noise suppressor uses an input signal transformed into a frequency domain to estimate a power spectrum of a noise component, and subtracts the estimated power spectrum of the noise component from the input signal. Alternatively, there is a widespread method including multiplying the input signal by a gain less than one to obtain a result equivalent to that by subtraction. Noise mixed into a desired acoustic signal is thus suppressed. Moreover, such a noise suppressor may be applied to suppression of non-stationary noise by continuously estimating the power spectrum of noise components. A technique related to such a noise suppressor is disclosed in Patent Document 1, for example (which will be referred to as first related technique).
Generally, the noise suppressor, which is the first related technique, has a tradeoff between residual noise left from suppression, i.e., a degree of separation of desired voice from background noise, and distortion involved in enhanced output voice. A higher degree of separation to reduce residual noise results in increased distortion, while reduced distortion causes the degree of separation to decrease and residual noise to increase. Particularly, for a smaller power ratio of desired voice to noise, distortion contained in an output obtained by a least noise suppression effect is more significant.
On the other hand, the fact that a human auditory organ has ability to discriminating differently localized signals is disclosed in Non-patent Document 1. Perception of localization requires multi-channel signals. Therefore, in a case that a monophonic signal is input, it must be converted into a multi-channel signal. One method of controlling signal localization is rendering processing for manipulating the amplitude and phase of a given signal. A technique related to the rendering processing is disclosed in Patent Document 2. In a case that at least two channels of signals are input, the human auditory organ uses the difference in amplitude and phase (a relative delay at a reception point) between these signals to spatially localize these signals. Based on this principle, rendering controls a localized position by manipulating the amplitude and phase of an input signal. For example, there is a rendering system for convoluting an unlocalizable monophonic signal with a plurality of transfer functions defined by the amplitude and phase having a specific relationship to generate a multi-channel output. Such a rendering system is shown in FIG. 20 (which will be referred to as second related technique).
As shown in FIG. 20, a rendering system according to the second related technique receives monophonic input 0 at a rendering section 9, and outputs Mo-channel signals including output 0-output Mo−1. The rendering section 9 applies rendering to input 0 based on rendering information, and outputs a result as output 0-output Mo−1. In a case that input 0 contains a plurality of signal components, all the signal components are localized at the same point in space, because the same rendering processing is applied to all signal components.
Patent Document 1: JP-P2002-204175A
Patent Document 2: JP-P1999-46400A
Non-patent Document 1: “Mechanism of Calculation by Brain—Dynamics in Bottom-up/Top-down—,” Asakura Publishing Co., Ltd. (2005), Pages 203-216