1. Field of the Invention
The present invention relates to a microphone array method and system, and more particularly, to a microphone array method and system for effectively receiving a target signal among signals input into a microphone array, a method of decreasing the amount of computation required for a multiple signal classification (MUSIC) algorithm used in the microphone array method and system, and a speech recognition method and system using the microphone array method and system.
2. Description of the Related Art
With the development of multimedia technology and the pursuit of a more comfortable life, controlling household appliances such as televisions (TVs) and digital video disc (DVD) players with speech recognition has been increasingly researched and developed. To realize a human-machine interface (HMI), a speech input module receiving a user's speech and a speech recognition module recognizing the user's speech are needed. In an actual environment of a speech interface, a user's speech, as well as interference signals, such as music, TV sound, and ambient noise, are present. To implement a speech interface for a HMI in the actual environment, a speech input module capable of acquiring a high-quality speech signal regardless of ambient noise and interference is needed.
A microphone array method uses spatial filtering in which a high gain is given to signals from a particular direction and a low gain is given to signals from other directions, thereby acquiring a high-quality speech signal. A lot of research and development for increasing the performance of speech recognition by acquiring a high-quality speech signal using such a microphone array method has been conducted. However, because a speech signal has a wider bandwidth than a narrow bandwidth which is a primary condition in array signal processing technology, and due to problems caused by, for example, various echoes in an indoor environment, it is difficult to actually use the microphone array method for a speech interface.
To overcome these problems, an adaptive microphone array method based on a generalized sidelobe canceller (GSC) may be used. Such an adaptive microphone array method has advantages of a simple structure and a high signal to interface and noise ration (SINR). However, performance deteriorates due to an incidence angle estimation error and indoor echoes. Accordingly, an adaptive algorithm robust to the estimation error and echoes is desired.
In addition, there are wideband minimum variance (MV) methods in which a minimum variance distortionless response (MVDR) may be applied to wideband signals. Wideband MV methods are divided into MV methods and maximum likelihood (ML) methods according to a scheme of configuring an autocorrelation matrix of a signal. In each method, a variety of schemes of configuring the autocorrelation matrix have been proposed for example, a microphone array based on a wideband MV method may be used by, etc.
The following description concerns a conventional microphone array method. When D signal sources are incident on a microphone array having M microphones in directions θ=, assuming that θ1 is a direction of a target signal and the remaining directions are those of interference signals. Discrete Fourier transforming data input to the microphone array and signal modeling are performed by expressing a vector of frequency components obtained by the discrete Fourier transformation, shown in Equation (1). Hereinafter, the vector of frequency components is referred to as a frequency bin.xk=Aksk+nk   (1)
Here, xk=[X1,k . . . Xm,k . . . XM,k]T, Ak=[ak(θ1) . . . ak(θd) . . . ak(θD)], sk=[S1,k . . . Sd,k . . . SD,k]T, nk=[N1,k . . . Nm,k . . . NM,k]T, and “k” is a frequency index. Xm,k and Nm,k are discrete Fourier transform (DFT) values of a signal and background noise, respectively, observed at an m-th microphone, and Sd,k is a DFT value of a d-th signal source. ak(θd) is a directional vector of a k-th frequency component of the d-th signal source and can be expressed as Equation (2).ak(θd)=[e−jwkτk,1(θd) . . . e−jwkτk,m(θd) . . . e−jwkτk,M(θd)]T   (2)
Here, τk,m(θd) is the delay time taken by the k-th frequency component of the d-th signal source to reach the m-th microphone.
An incidence angle of a wideband signal is estimated by discrete Fourier transforming an array input signal, applying a MUSIC algorithm to each frequency component, and finding the average of MUSIC algorithm application results with respect to a frequency band of interest. A pseudo space spectrum of the k-th frequency component is defined as Equation (3).
                                          P            k                    ⁡                      (            θ            )                          =                                                            a                k                H                            ⁡                              (                θ                )                                      ⁢                                          a                k                            ⁡                              (                θ                )                                                                                        a                k                H                            ⁡                              (                θ                )                                      ⁢                          U                              n                ,                k                                      ⁢                          U                              n                ,                k                            H                        ⁢                                          a                k                            ⁡                              (                θ                )                                                                        (        3        )            
Here, Un,k indicates a matrix consisting of noise eigenvectors with respect to the k-th frequency component, and ak(θ) indicates a narrowband directional vector with respect to the k-th frequency component. When the incidence angle of the wideband signal ak(θ) is identical to an incidence angle of a temporary signal source, the denominator of the pseudo space spectrum becomes “0” because a directional vector is orthogonal to a noise subspace. As a result, the pseudo space spectrum has an infinite peak. An angle corresponding to the infinite peak indicates an incidence direction. Here, an average pseudo space spectrum can be expressed as Equation (4).
                                          P            _                    ⁡                      (            θ            )                          =                              1                                          k                H                            -                              k                L                                              ⁢                                    ∑                              k                =                                  k                  L                                                            k                H                                      ⁢                                          P                k                            ⁡                              (                θ                )                                                                        (        4        )            
Here, kL and kH respectively indicate indexes of a lowest frequency and a highest frequency of the frequency band of interest.
In a wideband MV algorithm, a wideband speech signal is discrete Fourier transformed, and then a narrowband MV algorithm is applied to each frequency component. An optimization problem for obtaining a weight vector is derived from a beam-forming method using different linear constraints for different frequencies.
                                          min                          w              k                                ⁢                                    w              k              H                        ⁢                          R              k                        ⁢                          w              k                        ⁢            subject            ⁢                                                  ⁢            to            ⁢                                                  ⁢                                          a                k                H                            ⁡                              (                                  θ                  1                                )                                      ⁢                          w              k                                      =        1                            (        5        )            
Here, a spatial covariance matrix Rk is expressed as Equation (6).Rk=E[xkxkH]  (6)
When Equation (6) is solved using a Lagrange multiplier, a weight vector wk is expressed as Equation (7).
                              w                      k            mv                          =                                            R              k                              -                1                                      ⁢                                          a                k                            ⁡                              (                                  θ                  1                                )                                                                                        a                k                H                            ⁡                              (                                  θ                  1                                )                                      ⁢                          R              k                              -                1                                      ⁢                                          a                k                            ⁡                              (                                  θ                  1                                )                                                                        (        7        )            
Wideband MV methods are divided into two types of methods according to a scheme of estimating the spatial covariance matrix Rk in Equation (7): (1) MV beamforming methods in which a weight is obtained in a section where a target signal and noise are present together; and (2) SINR beamforming methods or Maximum Likelihood (ML) methods in which a weight is obtained in a section where only noise without a target signal is present.
FIG. 1 illustrates a conventional microphone array system. The conventional microphone array system integrates an incidence estimation method and a wideband beamforming method. The conventional microphone array system decomposes a sound signal input into an input unit 1 having a plurality of microphones into a plurality of narrowband signals using a discrete Fourier transformer 2 and estimates a spatial covariance matrix corresponding to each narrowband signal using a speech signal detector 3, and a spatial covariance matrix estimator 4. The speech signal detector 3 distinguishes a speech section from a noise section. A wideband MUSIC module 5 performs eigenvalue decomposition of the estimated spatial covariance matrix, thereby obtaining an eigenvector corresponding to a noise subspace, and calculates an average pseudo space spectrum using Equation (4), thereby obtaining direction information of a target signal. Thereafter, a wideband MV module 6 calculates a weight vector corresponding to each frequency component using Equation (7) and multiplies the weight vector by each corresponding frequency component. An inverse discrete Fourier transformer 7 restores compensated frequency components to the sound signal.
The above discussed conventional system reliably operates when estimating a spatial covariance matrix in a section having only an interference signal without a speech signal. However, when obtaining a spatial covariance matrix in a section having a target signal, the conventional system removes the target signal as well as the interference signal. This result occurs because the target signal is transmitted along multiple paths as well as a direct path due to echoing. In other words, echoed target signals transmitted in directions other than a direction of a direct target signal are considered as interference signals, and the direct target signal having a correlation with the echoed target signals is also removed.
To overcome the above-discussed problem, a method or a system for effectively acquiring a target signal with less effect of an echo is desired.
In addition, a method of decreasing the amount of computation required for the MUSIC algorithm is also desired because the wideband MUSIC module 5 performs a MUSIC algorithm with respect to each frequency bin, which puts a heavy load on the system.