1. Field of the Invention
The present invention relates to an adaptive beamformer, and more particularly, to a method and apparatus for adaptive beamforming using a feedback structure.
2. Description of the Related Art
Mobile robots have applications in health-related fields, security, home networking, entertainment, and so forth, and are the focus of increasing interest. Interaction between people and mobile robots is necessary when operating the mobile robots. Like people, a mobile robot with a vision system has to recognize people and surroundings, find the position of a person talking in the vicinity of the mobile robot, and understand what the person is saying.
A voice input system of the mobile robot is indispensable for interaction between man and robot and is an important factor affecting autonomous mobility. Important factors affecting the voice input system of a mobile robot in an indoor environment are noise, reverberation, and distance. There are a variety of noise sources and reverberation due to walls or other objects in the indoor environment. Low frequency components of a voice are more attenuated than high frequency components with respect to distance. Accordingly, for proper interaction between a person and an autonomous mobile robot within a house, a voice input system has to enable the robot to recognize the person's voice at a distance of several meters.
Such a voice input system generally uses a microphone array comprising at least two microphones to improve voice detection and recognition. In order to remove noise components contained in a speech signal input via the microphone array, a single channel speech enhancement method, an adaptive acoustic noise canceling method, a blind signal separation method, and a generalized sidelobe canceling method are employed.
The single channel speech enhancement method, disclosed in “Spectral Enhancement Based on Global Soft Decision” (IEEE Signal Processing Letters, Vol. 7, No. 5, pp. 108-110, 2000) by Nam-Soo Kim and Joon-Hyuk Chang, uses one microphone and ensures high performance only when statistical characteristics of noise do not vary with time, like stationary background noise. The adaptive acoustic noise canceling method, disclosed in “Adaptive Noise Canceling: Principles and Applications” (Proceedings of IEEE, Vol. 63, No. 12, pp. 1692-1716, 1975) by B. Widrow et al., uses two microphones. Here, one of the two microphones is a reference microphone for receiving only noise. Thus, if only noise cannot be received or noise received by the reference microphone contains other noise components, the performance of the adaptive acoustic noise canceling method sharply drops. Also, the blind signal separation method is difficult to use in the actual environment and to implement real-time systems.
FIG. 1 is a block diagram of a conventional adaptive beamformer using the generalized sidelobe canceling method. The conventional adaptive beamformer includes a fixed beamformer (FBF) 11, an adaptive blocking matrix (ABM) 13, and an adaptive multi-input canceller (AMC) 15. The generalized sidelobe canceling method is described in more detail in “A Robust Adaptive Beamformer For Microphone Arrays With A Blocking Matrix Using Constrained Adaptive Filters” (IEEE Trans. Signal Processing, Vol. 47, No. 10, pp. 2677-2684, 1999) by O. Hoshuyama et al.
Referring to FIG. 1, the FBF 11 uses a delay-and-sum beamformer. In other words, the FBF 11 obtains the correlation of signals, xm(k), where m is an integer between 1 and M, input via microphones and calculates time delays among signals input via the microphones. Thereafter, the FBF 11 compensates for signals input via the microphones by the calculated time delays, and then adds the signals in order to output a signal b(k) having an improved signal-to-noise ratio (SNR). The ABM 13 subtracts the signal b(k) output from the FBF 11 through adaptive blocking filters (ABFs) from each of the signals whose time delays are compensated for in order to maximize noise components. The AMC 15 filters signals zm(k), where m is an integer between 1 and M, output from the ABM 13 through adaptive canceling filters (ACFs), and then adds the filtered signals, thereby generating noise components via M microphones. Thereafter, a signal output from the AMC 15 is subtracted from the signal b(k), which is delayed for a predetermined period of time D, to obtain a signal y(k) in which noise components are cancelled.
The operations of the ABM 13 and the AMC 15 shown in FIG. 1 will be described in more detail with reference to FIG. 2. The operations of the ABM 13 and the AMC 15 are the same as in the adaptive acoustic noise canceling method.
Referring to FIG. 2, the size of symbols S+N, S, and N denotes the relative magnitude of speech and noise signals in specific locations, and left symbols and right symbols separated by a slash ‘/’ denote ‘to-be’ and ‘as-is’ states, respectively.
An ABF 21 adaptively filters the signal b(k) output from the FBF 11 according to the signal output from a first subtractor 23 so that a characteristic of speech components of the filtered signal output from the ABF 21 is the same as that of speech components of a microphone signal x′m(k) that is delayed for a predetermined period of time. The first subtractor 23 subtracts the signal output from the ABF 21 from the microphone signal x′m(k), where m is an integer between 1 and M, to obtain and output a signal zm(k) which is generated by canceling speech components S from the microphone signal x′m(k).
An ACF 25 adaptively filters the signal zm(k) output from the first subtractor 23 according to the signal output from a second subtractor 27 so that a characteristic of noise components of the filtered signal output from the ACF 25 is the same as that of noise components of the signal b(k). The second subtractor 27 subtracts the signal outputs from the ACF 25 from the signal b(k) and outputs a signal y(k) which is generated by canceling noise components N from the signal b(k).
However, the above-described generalized sidelobe canceling method has the following drawbacks. The delay-and-sum beamformer of the FBF 11 has to generate the signal b(k) with a very high SNR so that only pure noise signals are input to the AMC 15. However, because the delay-and-sum beamformer outputs a signal whose SNR is not very high, the overall performance drops. As a result, since the ABM 13 outputs a noise signal containing a speech signal, the AMC 15, using the output of the ABM 13, regards speech components contained in the signal output from the ABM 13 as noise and cancels the noise. Therefore, the adaptive beamformer finally outputs a speech signal containing noise components. Also, because filters used in the generalized sidelobe canceling method have a feedforward connection structure, finite impulse response (FIR) filters are employed. When such FIR filters are used in the feedforward connection structure, 1000 or more filter taps are needed in a room reverberation environment. In addition, in a case where the ABF 21 and the ACF 25 are not properly trained, the performance of the adaptive beamformer may deteriorate. Thus, speech presence intervals and speech absence intervals are necessary for training the ABF 21 and the ACF 25. However, these training intervals are generally unavailable in practice. Moreover, because adaptation of the ABM 13 and the AMC 15 has to be alternately performed, a voice activity detector (VAD) is needed. In other words, for adaptation of the ABF 21, a speech component is a desired signal and a noise component is an undesired signal. On the contrary, for adaptation of the ACF 25, a noise component is a desired signal and a speech component is an undesired signal.