(1) Field of the Invention
This invention relates to blind signal separation; more particularly it relates to blind signal separation of signals that have undergone convolutive mixing, and to a method, an apparatus and a computer program for implementing it.
(2) Description of the Art
Blind signal separation is known: it is a form of signal processing implemented by software running on a computer system and accepting data from sensors after conversion from analogue form to digital form. The expression “blind” indicates that no assumptions are made about signal characteristics or processes which form signal mixtures, other than an assumption that signals in a mixture were statistically independent prior to mixing. There are many techniques for separating signals from one another which rely on foreknowledge of signal characteristics. The foreknowledge might be a signal's arrival direction, frequency, waveform, timing, amplitude modulation etc. Blind signal separation however only requires signals to be statistically independent, a realistic hypothesis that usually holds in real scenarios: a set of signals is statistically independent if information about one of its signals cannot be obtained from the others, and information about a sub-set of the signals cannot be derived from knowledge of the values of other signals in the set.
Two further assumptions are normally made in blind signal separation, stationarity and linearity, and these assumptions are also made in connection with the present invention. Stationarity means that signals and channels in which they mix do not change over a time interval during which mixed signals are sampled. Linearity means that mixtures of signals received by sensors are linear combinations of these signals. More complicated combinations featuring signal products and squares and higher order powers of signals are not considered.
The aim of blind signal separation is to recover signals as they were prior to mixing, i.e. original signals. The technique is also known as Independent Component Analysis (ICA), which will be treated as synonymous with blind signal separation for the purposes of this specification. As the objective is to separate mixed signals from one another, blind signal separation is sometimes referred to as “unmixing”.
A simple example of an application of blind signal separation involves two loudspeakers transmitting to two receiving microphones. The microphones receive and generate mixtures of signals from both loudspeakers, but the mixtures differ because paths from loudspeakers to receivers are different. In the case of both loudspeakers transmitting speech signals it is difficult or even impossible to make sense of the output of either receiver on its own.
A similar problem may be found in separating co-channel radio signals received by RF receivers, separating machine vibration signals measured by accelerometers or even finding underlying factors in closing stock market prices. In all these situations there may be several signals driving the sensors, or in the last example several varying factors affecting prices.
Statistical independence can be expressed as ability to factorise mixed signals' joint probability density function into a product of the signals' individual probability density functions. The simplest form of blind signal separation problem is referred to as the instantaneous mixing problem: here the propagation delay between each signal and each of a set of sensors can be represented as a simple phase shift applied to the same time samples of that signal.
Many different algorithms for solving the instantaneous mixing problem (hereinafter “instantaneous algorithms”) can be found in the literature. Some of the better known instantaneous algorithms are referred to as JADE, SOBI, BLISS and fast ICA, and are as defined in the references below:    “JADE”: J F Cardoso and A Souloumiac, “Blind Beamforming for non-Gaussian signals”, IEE proceedings-F Vol 140 No 6 December 1993;    “SOBI”: A. Belouchrani, K Abed-Meraimm, J Cardoso and E Moulines; “A Blind Source Separation Technique Using Second Order Statistics”, IEEE transactions on signal processing, Vol 45 No 2 February 1997;    “BLISS”: I. J. Clarke, “Direct Exploitation of Non-Gaussianity as a Discriminant”. EUSIPCO 1998, September 1998; and    “Fast ICA”: A. Hyvarinen, E. Oja, “A Fast Fixed-Point Algorithm for Independent Component Analysis”, Neural Computation 9, P1483-1492, 1997
These instantaneous algorithms have a two-stage structure (although that is not essential) comprising a second order decorrelation stage followed by a unitary rotation stage. The second order decorrelation stage is intended to impose second order independence, and the unitary rotation stage is intended to impose higher order independence while leaving second order independence unaffected.
The second order decorrelation stage consists of decorrelating and normalising signals. Decorrelation is the process of removing all correlations or similarities between signal pairs in a set of signals, correlation being defined mathematically as an integral of the product of the signals over time. Normalisation is the process of forcing signals in a set of signals to have the same power level.
The combination of decorrelation and normalisation establishes second order statistical independence. After this, the signals undergo a rotation. In becoming mixed, signals undergo a process (whose effects must be removed to separate them) which is a complicated combination of rotation, stretching and shearing. Decorrelation removes stretching and shearing effects, so that only a rotation needs to be applied to separate the signals. Rotation cannot apply shearing or stretching, and thus cannot counteract decorrelation.
Sometimes it is impossible to find a rotation which is appropriate: e.g. for two mixed signals each having a Gaussian probability density function, second order independence implies total independence. This is because Gaussian distributions do not give rise to dependencies above second order. Thus two independent signals with Gaussian probability density functions have a joint probability density function which has total rotational symmetry, and which in consequence is completely unchanged by rotation through any angle.
The second or higher order stage of prior art instantaneous algorithms therefore searches for a rotation implemented by a unitary matrix that restores higher order independence to the signals.
In one reference (preprint available from http://mns.brain.riken.go.jp/˜akuzawa/publ/html), entitled “Extended Quasi-Newton Method for the ICA”, T. Akuzawa suggests an algorithm that does not use decorrelation. Instead a gradient descent approach is suggested to minimise a fourth order measure of dependence. A. Yeredor, in “Approximate Joint Diagonalisation using Non-Orthogonal Matrices”, Proc ICA2000 p 33-38, Helsinki, Finland June 2000 also avoids decorrelation, but uses a method that minimises a measure based on both second and fourth order dependencies. This allows correlations to be treated the same as fourth order dependencies.
P. Comon, in “Independent Component Analysis, A new concept?” Signal Processing 36 P287-314 1994, discloses a closed form solution using decorrelation. Comon attempts to find the whole unitary matrix at once by repeatedly sweeping through 2 by 2 four-element sub-blocks of the unitary matrix. Its objective is to maximise a fourth order measure of independence. In “Blind Beamforming for non-Gaussian signals”, IEE proceedings-F Vol 140 No 6 December 1993, J F Cardoso and A Souloumiac disclose producing an algorithm referred to as “JADE”. JADE is similar to Comon's algorithm, but has higher speed from using joint approximate diagonalisation.
Belouchrani et al disclosed modifying the JADE algorithm to produce the SOBI algorithm in “A Blind Source Separation Technique Using Second Order Statistic”, IEEE transactions on signal processing, Vol 45 No 2 February 1997. The SOBI algorithm only differs from the JADE algorithm in its objective function, which is a second order measure of independence that has to be maximized. It also has the speed advantages of the JADE algorithm from using joint diagonalisation. However SOBI does rely on the signals having different spectral information and can fail if this is not the case.
In “A Fast Fixed-Point Algorithm for Independent Component Analysis”, Neural Computation 9 P1483-1492, 1997, A. Hyvarinen, E. Oja disclose an algorithm referred to as the fast ICA algorithm. This algorithm uses signal decorrelation and then attempts implement rotation by building up a unitary matrix one row at a time. To determine the correct rotation, it seeks a maximum in an objective function which is a fourth order measure of independence, or a measure of independence based on non-linearities.
A variant on the fast ICA algorithm has recently been suggested by A. Hyvarinen in “Complexity Pursuit: Combining Nongaussianity and Autocorrelations for Signal Separation”, ICA2000 P567-572 Helsinki, Finland June 2000. To determine rotation, it seeks a maximum not of independence but of (Komogoroff) complexity.
In “Direct Exploitation of Non-Gaussianity as a Discriminant”. EUSIPCO 1998, September 1998, I. J. Clarke discloses an algorithm referred to as BLISS. This uses signal decorrelation, and then carries out pairwise sweeps as in JADE and SOBI. To determine rotation, BLISS seeks a maximum in an objective function based upon aligning of an estimated joint probability density function with an axis of a coordinate system in which it is plotted: this finds the required rotation explicitly.
Unfortunately, an algorithm which is adequate for the instantaneous mixing problem cannot cope with more difficult problems. These more difficult problems occur when the output of a sensor must be expressed mathematically as a convolution, i.e. a combination of a series of replicas of a signal relatively delayed with respect to one another. It is therefore referred to as the “convolutive mixing” problem.
Blind signal separation of convolutively mixed signals is an area of much current research. Several techniques have been suggested, some of which are successful in some specific set of circumstances. The techniques tend to be slow, require extra assumptions to hold and often have step size/convergence problems.
The approach used in instantaneous algorithms has been extended to the convolutive mixing situation: from this approach it has been inferred that convolutively mixed signals could be unmixed by a two stage algorithm, a first stage imposing second order independence and a second stage imposing higher order independence but not affecting second order independence. This algorithm would accommodate time delays involved in mixing and unmixing. As a first stage, the mixed signals may be transformed by a multichannel lattice filter to obtain decorrelated and whitened signals: in this connection, signal whitening involves forcing a signal to have the same power at all frequencies. Whitening a set of signals means whitening all such signals individually.
Instead of the unitary matrix employed in instantaneous algorithms, the second stage of the convolutive unmixing algorithm employs a paraunitary matrix. As will be described later in more detail, a paraunitary matrix is a polynomial matrix which gives the identity matrix when multiplied by its paraconjugate matrix—a polynomial matrix equivalent of a Hermitian conjugate matrix. A possible approach to the convolutive unmixing problem is therefore to apply a multichannel lattice filter to impose second order independence and whitening, and then to search for a paraunitary matrix that maximises a measure of fourth or higher order independence.
Most prior art convolutive unmixing algorithms do not use a decorrelation procedure. Instead they tend to try to adjust coefficients of unmixing filters using gradient descent or neural network techniques. However some of these algorithms do use decorrelation. They have differing objective functions and differing methods of extracting the paraunitary matrix.
The following three papers disclose applying gradient descent algorithms to maximise entropy of outputs, but without using a full decorrelation step; the first of these papers also suggested using a limited form of decorrelation as a pre-processing step to initialise the method, but did not maintain decorrelation beyond initialisation:    T. Lee, A. Ziehe, R. Orglemeister, T. Sejnowski, “Combining Time-Delayed Decorrelation and ICA: Towards Solving the Cocktail Party Problem”, IEEE International Conference on Acoustics, Speech and Signal Processing, 1249-1252, Seattle, May 1998;    K. Torkkola, “Blind Separation of Convolved Sources based on Information Maximisation”, IEEE workshop on Neural Networks for Signal Processing, Kyoto, Japan, September 1996; and    T. Lee, A. J. Bell, R. H. Lambert, “Blind Separation of Delayed and convolved sources”, Advances in Neural Information Processing Systems, 9, 758-764 1997.
A similar algorithm is disclosed by J. K. Tugnait in “On Blind Separation of Convolutive Mixtures of Independent Linear Signals in Unknown Additive Noise”, IEEE Transactions on Signal Processing, Vol 46, No 11 November 1998. Here again the decorrelation step is not used, and a gradient descent method is used to adjust estimates of signals and mixing. Objective functions were used which were based upon fourth order measures of independence.
Another similar algorithm is disclosed by K. Rahbar and J. P. Reilly in “Blind Diagonalisation of Convolved Sources by Joint Approximate Diagonalisation of Cross-Spectral Density Matrices”. ICASSP2001 Salt Lake City May 2001. A gradient descent method was used to adjust separating filter parameters taken from a frequency domain representation of signal mixing; an objective function used was minimisation of signals' cross-spectral density. This is similar the method suggested by L. Parra and C. Spence in “Convolutive Blind Separation of Non-Stationary Sources”, IEEE transactions on Speech and Signal Processing Vol 8 No 3 May 2000. This further method used separation into different frequency components, together with gradient descent and minimisation of an objective function consisting of a cross correlation. It relied on the assumption that signals were non-stationary, but that mixing was stationary.
In “Adaptive Paraunitary Filter Banks for Contrast-Based Multichannel Blind Deconvolution”, ICASSP2001 Salt Lake City May 2001, X. Sun and S. C. Douglas disclosed decorrelation followed by finding a paraunitary matrix. After decorrelation the order of the polynomial unmixing matrix being sought was fixed and then looked for by gradient descent. At every step the matrix was forced to be nearly paraunitary. An objective function was used for gradient descent which aimed at maximising fourth order independence.
A similar methodology was disclosed by K. Matsuoka, M. Ohata and T. Tokunari in “A Kurtosis-Based Blind Separation of Sources Using the Cayley Transform”, AS-SPCC 2000 This used decorrelation followed by gradient descent for the paraunitary matrix with a fourth order measure of independence as an objective function. It differs to the foregoing in that it used parameterisation based upon the Cayley transform allowing gradient descent in a linear space that was a transformation of paraunitary matrix space.
In “An Algebraic Approach to Blind MIMO Identification”, ICA2000 P211-214 Helsinki, Finland June 2000 L. De Lathauwer, B. De Moor and J. Vandewalle; Lathauwer, De Moor and Vandewalle disclose decorrelation together with parameterisation of the paraunitary matrix as disclosed by Vaidyanathan (to be discussed later). The objective was to find a series of delays and rotations of the parameterisation by minimising a fourth order measure of independence based upon the number of blocks still to be found. This relied on the assumption that the order of the paraunitary matrix was known, or guessed correctly in advance.
The last three of the above prior art methods rely on the assumption that there is prior knowledge of the degree of the paraunitary matrix being sought. In practice the degree is guessed, and if wrongly the methods are incapable of correcting for it. If the guess is too large a solution may still be found, but with much unnecessary processing and performance degraded. If the guess is too small the algorithm will simply fail to produce a useful result.
Gradient descent methods that aim to adjust all the parameters of a paraunitary matrix or of an unmixing filter at once have another difficulty: the parameters are linked to any useful measure of independence in a very complex way which does not factorise easily. This means adjusting all parameters at once leading to very slowly converging algorithms.
In “Multirate Systems and Filter Banks”, Prentice Hall: Signal Processing Series, 1993, P. P. Vaidyanathan discloses parameterisation of paraunitary matrices in a stage-by-stage decomposition of a paraunitary matrix in z−1: here z−1 is a delay operator implementing a delay. Vaidyanathan shows that a product matrix built up from a series of pairs of paraunitary matrix blocks is paraunitary: here one block represents a delay and the other a 2 by 2 unitary matrix implementing a Givens rotation (see U.S. Pat. No. 4,727,503). It is proved in Vaidyanathan that a paraunitary matrix of degree N is the product of N+1 rotations and N one-channel delay operators all implementing the same unit delay.
The difficulty with using Vaidyanathan's parameterisation is that the first step in unmixing signals is to look for a rotation to apply, even if one is unnecessary. This superfluous rotation is very difficult for later parameterisation blocks to undo; moreover, it mixes signals to a further degree—e.g. in a two channel case each mixed signal is now a sum of four original signals instead of two. The signals become closer to a Gaussian distribution, and therefore correct rotations are more difficult to find. Thus the superfluous rotation makes the problem more difficult to solve in a way that is difficult to correct. It can lead to failure of the method for even moderately sized problems.