1. Field of the Invention
The present invention relates generally to estimating multiple electrical or acoustic source signals from only mixtures of these signals, and more specifically to obtaining in real-time estimates of the mixing parameters of such signals. Furthermore, in the present invention a demixing (or separation) of a mixture of the signals is accomplished by using estimated mixing parameters to partition a time-frequency representation of one of the mixture signals into estimates of its source signals.
2. Description of the Prior Art
Blind source separation (BSS) is a class of methods that are used extensively in areas where one needs to estimate individual original signals from a linear mixture of the individual signals. One area where these methods are important is in the electromagnetic (EM) domain, such as in wired or wireless communications, where nodes or receiving antennas typically receive a linear mixture of delayed and attenuated EM signals from a plurality of signal sources. A further area where these methods are important is in the acoustic domain where it is desirable to separate one voice or other useful signal being received from background noise being received by one or more receivers, such as microphones in a telephone or hearing aid (where the mixtures of acoustic signals are received by one or more microphones and the mixtures often contain undesirable signals such as other voices or environmental noise). Even further areas of application of BSS techniques of the present invention are in surface acoustic waves processing, signal processing and radar signal processing.
The difficulty in separating the individual original signals from their linear mixtures is that in many practical applications little is known about the original signals or the way they are mixed. An exception, however, is in wireless communications where training sequences of signals known ahead of time are transmitted intermixed with the useful data signals, thereby allowing for transmission channel estimation. These training sequences typically take a large portion of the available channel capacity and therefore this technique is undesirable.
In order to do demixing blindly some assumptions on the statistical nature of signals are typically made. Some solutions to BSS are known in the art for the limited case of instantaneous mixing, i.e., when the signals arrive at the receivers without delay in propagation, and are non-degenerately mixed, that is, when the number of signal emitters is less than or equal to the number of signal receivers. Such solutions are typically cumbersome since they involve computation of statistical quantities of signals that are third or higher order moments of signal probability distributions. The demixing is most often done under the assumption of statistical independence of signals, which implies that all moments of the signals factorize. One useful feature of the present invention is that a considerably weaker assumption on the nature of the signals is used, namely, W-disjoint orthogonality. We assume that our sources are deterministic functions for which the Fourier transform is well defined and that W(t) is a function that is localized in time. Then we call two sources, si(t) and sk(t), W-disjoint orthogonal if the supports of the Fourier transform of the W-windowed sources, denoted ŝiW (xcfx89,t) and ŝkW (xcfx89,t), are disjoint for all t, where,
ŝiW(xcfx89,t)=∫xe2x88x92∞∞W(xcfx84xe2x88x92t)si(xcfx84)exe2x88x92jxcfx89xcfx84dxcfx84.xe2x80x83xe2x80x83(1)
In other words, let us denote the support of ŝiW (xcfx89, t) with xcexa9i. Then si(t) and sk(t) are W-disjoint orthogonal if,
xcexa9i∩xcexa9k=xe2x88x85.xe2x80x83xe2x80x83(2)
Note, the above definition is valid if the Fourier transform of the signals are distributions.
A special case is if we choose W to be constant. Then two functions are W-disjoint orthogonal if their Fourier transforms have disjoint support. In this case, we will call them globally disjoint orthogonal, or simply, disjoint orthogonal. Two time dependent signals si(t), sk(t) are disjoint orthogonal if,
∫si(t)xc2x7{overscore (s)}k(t+xcfx84)xc2x7dt=0∀xcfx84i,k=1, . . . , N.
In practice condition (2) does not have to hold exactly. It is sufficient for the present invention if the sources are weakly W-disjoint orthogonal which we is defined formally as,
|ŝiW(xcfx89,t)| greater than  greater than 1xe2x86x92|ŝkW(xcfx89,t) less than  less than 1,∀kxe2x89xa0i,∀t,∀xcfx89.xe2x80x83xe2x80x83(3)
In order to differentiate (2) from (3), (2) is sometimes referred to as strong W-disjoint orthogonality. While we have used the Fourier transform in (1), it is important to note that any appropriate transform can be used (e.g., wavelet transforms) as long as condition (2) or (3) is satisfied.
It is generally believed in the art that computation of third and higher degree moments is necessary to demix signals in convolutive non-degenerate mixtures. In fact it has been proved by Weinstein et al., in his article entitled xe2x80x9cMulti-channel Signal Separation by Decorrelationxe2x80x9d, IEEE Transaction on Speech and Audio Processing, Vol. 1, No. 4, October 1993, that signals cannot be recovered from mixtures of statistically stationary and orthogonal signals for an arbitrary convolutive mixture. One useful feature of the present invention is that it presents a practical method to obviate the restriction in Weinstein et al., and establishes a practical method to demix signals mixtures from convolutive mixtures. In the present invention, the restriction is voided by reducing the class of signals to those that satisfy strong or weak W-disjoint orthogonality and by requiring the mixing of the signals to either have a parametric form or the corresponding mixing filters in convolutive mixing be sufficiently smooth in the frequency domain to allow extrapolation from the set of points on which the support of signals is non-zero. In practice weak W-disjoint orthogonality of signals is very frequently observed, as in the examples of speech and environmental acoustic noise, or can be imposed, such as on the waveform of an emitted EM signal in the case of wireless communications.
Although a number of methods are known in the state of the art for blind demixing of signal mixtures, as noted above most of them use higher order statistics to effect the demixing, such as taught by Bell et al. in their article entitled xe2x80x9cAn information-maximization approach to blind separation and blind deconvolutionxe2x80x9d, Neural Computation, vol. 7 pp. 1129-1159, 1995. The least computationally intensive methods known in the state of the art use second order statistics. They typically work by simplifying the demixing problem to an optimization problem so that the optimization objective function is a function of signal auto- and cross-correlations such as taught by Weinstein et al. in their forenoted article. However, even these methods are known to place an exceptional demand on computational resources, especially when there is need for real-time or on-line implementations. A useful feature of the present invention is that the necessary computations involve only first order moments, i.e., the signal values themselves, and no complicated computations of signal statistics is needed. This is particularly advantageous since computation of statistical properties of signals is typically prone to large measurement and estimation errors.
Yet another useful feature of the present invention is that instead of statistical orthogonality most commonly used in second order statistical methods, e.g., as assumed in the forenoted article by Weinstein et al., the present invention uses a deterministic definition of orthogonality. The use of W-disjoint orthogonality condition among other things is advantageous because it is much simpler test than one for statistical orthogonality. To test for statistical orthogonality one needs to create a statistical ensemble of signals with associated probability distributions or, under the additional assumption of signal ergodicity, take an infinite time limit of the integral of the product of two functions. On the other hand, the test for W-disjoint orthogonality involves only computation of one integral or sum for discrete sampled signals, which in practice results in considerable computational savings, since the test is non-statistical in nature.
One method known in the state of the art was presented by Rickard in a talk entitled xe2x80x9cBlind Separation of Acoustic Mixturesxe2x80x9d presented at Princeton University on May 24, 1999 where it proven that only second order moments of signals are needed to do instantaneous non-degenerate demixing. This method relies on second order statistics but does not translate directly in the domain where the signals are also delayed in arrival. A useful feature of the present invention is that even signals that are delayed can be extracted from their mixtures. A similar method is described in U.S. Ser. No. 60/134,655 entitled Fast Blind Source Separation Based on Delay and Attenuation Compensation filed on May 18, 1999, where only the case was considered where the number of receivers equals the number of emitters. In many practical cases, however, such as in wireless communications, it is desirable to do the demixing when the number of receivers is as small as possible, and in fact is less then the number of transmitters. One advantageous feature of the present invention is that it allows one to use only two receivers to demix received signals with an arbitrary number of mixed sources.
Although instantaneous mixing of signals can be assumed in some applications, in wireless communication applications delays in propagation have to be taken into account. Furthermore, in most applications the number of signal sources and their spatial positions with respect to the receiver is not known before the attempted demixing. Therefore a considerable literature exists concerning estimation of the number of sources and of their direction of arrival (DOA), generally known as channel estimation problem (see for example, B. Van Veen et al., xe2x80x9cBeamforming: A Versatile Approach to Spatial Filteringxe2x80x9d IEEE ASSP Magazine, April 1988, pp. 4-24). Especially known in the prior art are methods called MUSIC and ESPRIT (see H. Krim et al., xe2x80x9cTwo Decades of Array Signal Processing Research, The Parametric Approachxe2x80x9d, IEEE Signal Processing Magazine, July 1996, pp. 67-94). These and all other prior art methods assume that the number of sources is less or equal to the number of receivers. In wireless communications application, for example, this restriction places a fundamental limit on the channel capacity of the communication system, since the number of antennas has to be larger than the number of the users.
A useful feature of the present invention is that it allows one to estimate the number of signal sources even when the number of receivers is less than the number of emitters. In fact with the present invention only two receivers are generically necessary to estimate an arbitrary number of sources from a broad class of signals. Yet another useful feature of the present invention is that no assumption is made that signals are narrow-band, as is frequently done for methods such as MUSIC and ESPRIT. In a number of wireless communications applications, the assumption that signals are narrow-band is not valid and therefore the present invention allows one to estimate signals and channel parameters for wide-band signals as well.
Methods are known in the wireless communications prior art where the number of mixture signals can be less than the number of sources, yet the sources can be demixed (see, for example X. Wang et al., xe2x80x9cBlind Multiuser Detection: A Subspace Approach,xe2x80x9d IEEE Transactions on Information Theory, Vol. 44, No. 2, March 1998). However such methods require that the waveforms be modulated via direct-sequence spread spectrum and that the signature waveforms are already known by the receiver. One advantageous feature of the present invention is that no specific waveform shape is needed for demixing of signal sources, which makes the class of signals that can be demixed considerably larger and hence could lead to increases in the practical information throughput of many communications channels.
In many applications, such as environmental noise reduction, it is important to determine the spatial distribution of signal sources in order to determine where each signal source comes from. One application of the present invention uses a determined spatial distribution for the reduction of environmental noise emanating from machinery containing many moving parts, such as a moving locomotive or a copier. Another application of the present invention is predicting operating failure in such machinery by detecting deviations of the spatial distribution of noise, as compared to its normal distribution. A useful feature of the present invention is that it allows one to perform radiation field mapping, whereby the intensity and spatial distribution of the sources can be determined precisely using only two receivers, provided the sources meet the W-disjoint orthogonality. In one embodiment of the present invention the radiation field is the field of electromagnetic waves, and in another embodiment accomplishes acoustic field mapping.
One method for demixing of time delayed signals was described in the forenoted presentation by Rickard. Although the method, which relies on second order statistics, is less computationally intensive than comparable methods based on higher order statistics, it is still restricted to situations where the number of signal sources is less or equal to the number of the receivers. Furthermore, the method assumes that the number of signal sources is known in advance. One improvement provided by the present invention is that the number of sources for delayed signals can be estimated using mixture values only, without resorting to computation of the signal statistics. This provides a significant speed-up in signal processing.
Yet another complication for blind demixing of signal sources comes about when mixtures contain not only delayed and attenuated signals resulting from direct path propagation from emitters to receivers, but also reflected versions of those signals, which therefore arrive at the receivers with an additional delay and attenuation. Such multipath mixtures are common in wireless and acoustic telephony communications, as well as in hearing aid signal processing applications. To date no method is known in the state of the art that allows one to demix multipath mixtures of signals without using second or higher order moments of the signals. To the contrary it is the current state of the art belief that such demixing is not possible (as noted above in the aforementioned article by Weinstein et al). The present invention embodies a practical method for such demixing and hence represents a considerable advance in the state of the art. In one embodiment of the invention, weakly echoic mixtures can be demixed. In another embodiment multipath mixtures with sufficiently decorrelated echoes can be demixed. In yet another embodiment convolutive mixtures having smooth convolution filters can be demixed.
It is known in art that making assumptions about the class of the signals to be demixed can make demixing of the signals easier. Various models, such as AR or ARMA are used to effect demixing (see L. Parra et al., xe2x80x9cConvolutive Source Separation and Signal Modeling with MLxe2x80x9d, Sarnoff Corporation, Preprint Sep. 5, 1997 and H. Broman et al., xe2x80x9cSource Separation: A TITO System Identification Approachxe2x80x9d, Signal Processing, vol. 73, pp. 169-183, 1999). However these models are very restrictive and do not model real world signals well. It is one advantageous feature of the present invention that the class of signals that allows demixing is much wider and more suitable for modeling acoustic and EM radiation than the AR and ARMA processes.
One example in the prior art where second order statistics of the signals is used for multipath mixtures is in the paper by Parra et al., entitled xe2x80x9cConvolutive Blind Source Separation based on Multiple Decorrelationxe2x80x9d, Sarnoff Corporation, Preprint, undated, published in NNSP-98. However the method presented there relies explicitly on non-stationarity of the signals in order to achieve demixing. Additionally, the method assumes that a large number of potential multipath contributions have to be considered from the beginning. This leads to prohibitively computationally expensive algorithms for demixing. One useful feature of the present invention is that the signals are not assumed to be non-stationary. To the contrary, signal stationarity on a short time scale is required. This reduced requirement leads to implementations of methods that are capable of processing data in real time, since typical time scale data processing can realistically assume the data to be stationary in time. Yet another useful feature of the present invention is that for a broad class of signals, namely for signals for which the autocorrelation function decays sufficiently fast, one does not have to take into account all the multipath contributions. In order to demix, only the components of the mixtures that correspond to the direct path propagation contributions are needed, provided the remaining multipath contributions decorrelate with the direct path contributions. Such decorrelation is often a result of a fast decay of the signals autocorrelation function, which in practice is frequently observed. As a result, a very simple and computationally inexpensive method can be used to demix multipath mixtures. No comparable prior art demixing method for use in the full multipath environment is known.
It is known in the prior art that if only one of the signals in the mixture is non-zero at a given value of signal argument, then demixing is possible (see van Hulle, xe2x80x9cClustering Approach to square and non-square blind source separationxe2x80x9d, K.U. Leuven, Belgium Preprint Dec. 30, 1998). However the assumption used is that only one signal at a given time is non-zero and that the mixing is instantaneous. However, for demixing in acoustic environments or wireless communication applications, the constraint that the signal of only one source can be non-zero at a given time will often not be valid. Moreover, as the number of signal sources increases, this assumption is even less likely to be satisfied. In wireless communications, this one signal at a time assumption is only true for restrictive signaling schemes, such as time-division multiple access (TDMA), for which demixing is not of interest. The scalar mixing assumption is not true for real acoustic mixtures and is only true for a restrictive class of wireless communications (e.g., indoor wireless). One advantageous feature of the present invention is that the W-disjoint orthogonality condition envelops a much larger class of signals. The vanishing of all but one signal Fourier transform at a given frequency is observed in practice much more often than the vanishing of all but one of the received signals at a given time, in either of the acoustic or wireless communications environments. Yet another advantage of the present invention over the prior art is that the present invention is designed for a more realistic time delay model (i.e., convolutive signal mixing), for which the prior art techniques would not work.
Another known example for demixing mixture signals with less mixture than signal sources is given by Balan et al in his publication entitled xe2x80x9cA Particular Case of the Singular Multivariate AR Identification and BSS Problemsxe2x80x9d, 1st International Conference on Independent Component Analysis, Assuis, France, January 1999. However, in this case, only one mixture signal was available for demixing and hence the demixing problem was more difficult than that when two or more mixture signals are available. As a result the method described by Balan et al. utilizes modeling of the signals as AR processes. It is a further advantage of the present invention that it is sufficient to have two receivers in order to not restrict the signal classes to those generated by AR processes.
Yet another example of demixing of signals with fewer received mixture signals than signal sources uses higher order statistics (see the publication by Comon entitled xe2x80x9cBlind Channel identification and extraction of more sources than sensorsxe2x80x9d, Eurecom Institute, Preprint, 1998). However the use of higher order statistics leads to excessive computational demands, and in fact this publication states that extension of the demixing method from two mixtures of three signal sources to a higher number of signal sources is computationally unfeasible. It is a further advantageous feature of the present invention that an arbitrary number of signal sources can be demixed using only two mixture signals, provided that the sources are W-disjoint orthogonal and do not occupy same spatial position.
A method for blind channel estimation, comprising acquiring two mixtures of at least one at least weakly W-disjoint orthogonal source signal, calculating point-by-point ratios of a transform of a time-window of each of said mixture signals, determining channel parameter estimates from said ratios, constructing a histogram of said channel parameter estimates, repeating the calculating, determining and constructing steps for successive time windows of the mixture signals, and selecting as estimates of said channel parameters those estimates associated with identified peaks on said histogram.
A method and apparatus for demixing of at least weakly W-disjoint orthogonal mixture signals, comprising acquiring one or more channel parameter estimates, calculating point-by-point ratios of a transform of a time-window of each of said mixture signals, assigning the value for each point in the transform of a time-window of one of the mixture signals to a first signal source if the estimated channel parameters determined from the ratio are within a given threshold of one of the acquired channel parameter estimates or, if more than one estimate is provided, to the closest of the acquired channel parameter estimates, repeating the above step for each point in the transform in successive time-windows of the mixture signal, reconstructing time domain signals from the assigned values of signals by inverse transforming an accumulation of the assigned values for each signal.
In another aspect of the present invention, a method for voice activity detection is provided comprising the steps of setting a voice activity detection flag on; estimating the number of sources in a sound field map; demixing the sound source of interest; detecting absence of power in said sound source of interest; and setting the voice activity detection flag off.
In yet another aspect of the present invention, said voice activity detection is used in a hearing aid, and said source of interest is speech.
In yet another aspect of the present invention, a method for critical sound activity detection is provided comprising the steps of setting a critical sound activity detector flag off; computing a sound field map; estimating the number of sources in said sound field map; detecting additional sound sources that match one of the critical; sounds in their spectral characteristics; and setting critical sound activity flag on.
In yet another aspect, said critical sound source of interest is voice sound produced for alert in violation of personal safety in a public or private living area.
In yet another aspect, said critical sound source of interest is sound discharged by firearms or other explosive devices in a public or private living area.
FIGS. 1, 2 and 3 illustrate a histogram of amplitude estimates, a histogram of delay estimates and amplitude/delay clustering for an example of amplitude/delay estimation of two source signals using two microphones;
FIGS. 4, 5 and 6 illustrate a histogram of amplitude estimates, a histogram of delay estimates and amplitude/delay clustering for an example of amplitude/delay estimation of five source signals using two microphones;
FIG. 7 illustrates clustering of the amplitude/delay estimates onto transformed coordinates so as to provide a radiation field pattern of the estimates;
FIG. 8 illustrates a method and apparatus for amplitude/delay estimation in accordance with the principles of the present invention; and
FIG. 9 illustrates demixing in accordance with the principles of the present invention.