1. Field of the Invention
The present invention relates to blind source separation, and more particularly, to a method of and apparatus for separating mixture signals received through two microphones into N source signals while eliminating noise from the received signals in real time.
2. Description of the Related Art
Recently, mobile robots have become of particular interest in areas such as health, security, home networking, and entertainment. To operate a mobile robot, human-robot interaction is required. That is, the mobile robot must be able to assess its surroundings using a vision system, detect the presence of humans and possible obstacles, and understand commands given by an operator.
As such, the mobile robot indispensably requires a sound input system for human-robot interaction and autonomous traveling. Important issues affecting the performance of the sound input system of the mobile robot are factors such as noises, reverberations, and distances to an operator in an indoor environment. In an indoor environment, reverberations exist due to various noise sources, walls, or other objects. Low frequency components of a sound are attenuated more than high frequency components of the sound. The sound input system required for human-robot interaction in an indoor environment must be constructed so that the autonomous traveling mobile robot can receive the operator's voice command from a distance of several meters away and use the received voice command directly for sound recognition including speech recognition.
Generally, the sound input system uses a microphone array consisting of at least two microphones in order to improve the quality of sound and a sound recognition rate including a speech recognition rate. Also, the sound input system eliminates noise components included in sound signals received from the microphone array, using methods such as a single channel speech enhancement method, an adaptive acoustic noise canceling method, a generalized sidelobe canceling method, or a blind signal separation algorithm.
The single channel speech enhancement method uses one microphone and can be applied only to a case where a statistical property of noise is not time dependent, for example, stationary background noise. An acoustic noise elimination technique disclosed in the paper “Adaptive Noise Canceling: Principles and Applications,” by B. Widrow et al., Proceedings of IEEE, vol. 63, no.12, pp. 1692-1716, 1975, uses two microphones. One of the two microphones is a reference microphone to receive only specified noise. However, when the reference microphone receives other noise rather than only the predetermine noise, the noise elimination performance is greatly deteriorated. A generalized sidelobe canceling method disclosed in the paper “A Robust Adaptive Beamformer For Microphone Arrays With A Blocking Matrix Using Constrained Adaptive Filters,” by O. Hoshuyama et al., in IEEE Trans. Signal Processing, vol. 47, no. 10, pp. 2677-2684, 1999, is disadvantageous in that a voice activity detector is required and source signals as well as noise signals are eliminated.
Meanwhile, conventional techniques related to the Degenerate Unmixing Estimation Technique (DUET) as a type of blind signal separation algorithm include “Blind Separation of Disjoint Orthogonal Signals: Demixing n Sources from 2 Mixtures,” by A. Jourjine, S. Rickard, and O. Yilmaz, in Proc. Int. Conf. on Acoust., Speech, Signal Processing, 2000. vol 5, pp. 2985-2988, “Real-time Time-Frequency based Blind Source Separation,” by S. Rickard, R. Balan, and J. Rosca, in Proc. Int. Conf. on Independent Component Analysis and Blind Signal Separation, 2001, pp. 651-656, and “On the Approximate w-Disjoint Orthogonality of Speech,” by S. Richard and O. Yilmaz, in Proc. ICASSP 2002, pp. 529-532. Such conventional techniques have been developed on the basis of the w-disjoint orthogonality by which the frequency components of two sound signals s1(t) and s2(t) do not overlapped with each other, that is, at most one sound signal component occupies one frequency band. However, when noise is mixed with a mixture signal and the w-disjoint orthogonality is not satisfied, that is, when white Gaussian noise with spectra over an entire frequency band or fan noise generated by a fan motor with spectra over a relatively wide frequency band is mixed with the mixture signal (sound signal), a signal separation performance is greatly deteriorated.