Signal extraction (or enhancement) algorithms, in general, aim at creating favorable versions of received signals while at the same time attenuate or cancel other unwanted source signals received by a set of transducers/sensors. The algorithms may operate on single sensor data producing one or several output signals or it may operate on multiple sensor data producing one or several output signals. A signal extraction system can either be a fixed non-adaptive system that regardless of the input signal variations maintains the same properties, or it can be an adaptive system that may change its properties based on the properties of the received data. The filtering operation, when the adaptive part of the structural parameters is halted, may be either linear or non-linear. Furthermore, the operation may be dependent on the two states, signal active and signal non-active, i.e. the operation relies on signal activity detection.
Regarding for instance speech extraction, physical domains are recognized and thus have to be considered when reconstructing speech in a noisy environment. These domains pertain to time selectivity for instance appearing in speech booster/spectral subtraction/TDMA (Time Division Multiple Access) and others. The domain of frequency selectivity comprises Wiener filtering/notch filtering/FDMA (Frequency Division Multiple Access) and others. The spatial selectivity domain relates to Wiener BF (Beam Forming)/BSS (Blind Signal Separation)/MK (Maximum/Minimum Kurtosis)/GSC (Generalized Sidelobe Canceller)/LCMV (Linearly Constrained Minimum Variance)/SDMA (Space Division Multiple Access) and others. Another existing domain is the code selectivity domain including for instance CDMA (Code Division Multiple Access) method, which in fact is a combination of the above mentioned physical domain.
No scientific research or findings yet have been able to combine time selectivity, frequency selectivity, and spatial selectivity in enhancing/extracting wanted signals in a noisy environment. Especially, such a combination has not been carried out without pre-assumptions or special knowledge about the environment where signal extraction is accomplished. Hence, fully adaptive automatic signal extraction would be appreciated by those who are skilled in the art.
Especially the following problems are encountered by fully automatic signal extraction; sensor and source inter-geometry is unknown and changing; the number of desired sources is unknown; surrounding noise sources have unknown spectral properties; sensor characteristics are non-ideal and change due to ageing; complexity restrictions; needs to operate also in high noise scenarios.
A prior published work in the technical field of speech extraction is “BLIND SEPARATION AND BLIND DECONVOLUTION: AN INFORMATION-THEORETIC APPROACH” to Anthony J. Bell and Terrence J. Sejnowski, at Computational Neurobiology Laboratory, The Salk Institute, 10010 N. Torrey Pines Road, La Jolla, Calif. 92037, 0-7803-2431 45/95$4.00 0 1995 IEEE.
Blind separation and blind deconvolution are related problems in unsupervised learning. In blind separation, different people speaking, music etc are mixed together linearly by a matrix. Nothing is known about the sources, or the mixing process. What is received is the N superposition's of them, x1(t), x2(t) . . . , xN(t). The task is thus to recover the original sources by finding a square matrix W which is a permutation of the inverse of an unknown matrix, A. The problem has also been called the ‘cocktail-party’ problem.
Another prior published work in the technical field of signal extraction relates to “Blind Signal Separation: Statistical Principles”, JEAN-FRANCOIS CARDOSO, PROCEEDINGS OF THE IEEE, VOL. 86, NO. 10, OCTOBER 1998.
Blind signal separation (BSS) and independent component analysis (ICA) are emerging techniques of array processing and data analysis that aim to recover unobserved signals or “sources” from observed mixtures (typically, the output of an array of sensors), exploiting only the assumption of mutual independence between the signals. The weakness of the assumptions makes it a powerful approach, but it requires to venture beyond familiar second order statistics. The objectives of the paper are to review some of the approaches that have been recently developed to address this problem, to illustrate how they stem from basic principles, and to show how they relate to each other.
BSS-ICA/PCA, ICA is equivalent to nonlinear PCA, relying on output independence/de-correlation. All signal sources need to be active simultaneously, and the sensors recording the signals must equal or outnumber the signal sources. Moreover, the existing BSS and its equals are only operable in low noise environments.
Yet another prior published work in the technical field of signal extraction relates to “BLIND SEPARATION OF DISJOINT ORTHOGONAL SIGNALS: DEMIXING N SOURCES FROM 2 MIXTURES”, Jourjine, A.; Rickard, S.; Yzimaz O.; Proceedings in 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, Volume 5, Page(s): 2985-2988, 5-9 Jun. 2000.
In this scientific article the authors present a novel method for blind separation of any number of sources using only two mixtures. The method applies when sources are (W-) disjoint orthogonal, that is, when the supports of the (windowed) Fourier transform of any two signals in the mixture are disjoint sets. It is shown that, for anechoic mixtures of attenuated and delayed sources, the method allows estimating the mixing parameters by clustering ratios of the time-frequency representations of the mixtures. Estimates of the mixing parameters are then used to partition the time-frequency representation of one mixture to recover the original sources. The technique is valid even in the case when the number of sources is larger than the number of mixtures. The general results are verified on both speech and wireless signals. Sample sound files can be found at: http://eleceng.ucd.ie/˜srickard/bss.html.
BSS-Disjoint Orthogonal de-mixing relies on non-overlapping time-frequency energy where the number of sensors>< the number of sources. It introduces musical tones, i.e. severe distortion of the signals, and operates only in low noise environments.
BSS-Joint cumulant diagonalization, diagonalizes higher order cumulant matrices, and the sensors have to outnumber or equal the number of sources. A problem related to it is its slow convergence as well as it only operates in low noise environments.
A still further prior published work in the technical field of signal extraction relates to “ROBUST SPEECH RECOGNITION IN A HIGH INTERFERENCE REAL ROOM ENVIRONMENT USING BLIND SPEECH EXTRACTION”, Koutras, A.; Dermatas, E.; Proceedings in 2002 14th International Conference on Digital Signal Processing, Volume 1, Page(s): 167-171, 2002.
This paper presents a novel Blind Signal Extraction (BSE) method for robust speech recognition in a real room environment under the coexistence of simultaneous interfering non-speech sources. The proposed method is capable of extracting the target speaker's voice based on a maximum kurtosis criterion. Extensive phoneme recognition experiments have proved the proposed network's efficiency when used in a real-life situation of a talking speaker with the coexistence of various non-speech sources (e.g. music and noise), achieving a phoneme recognition improvement of about 23%, especially under high interference. Furthermore, comparison of the proposed network to known Blind Source Separation (BSS) networks, commonly used in similar situations, showed lower computational complexity and better recognition accuracy of the BSE network making it ideal to be used as a front-end to existing ASR (Automatic Speech Recognition) systems.
The maximum kurtosis criterion extracts a single source with the highest kurtosis, and the number of sensors >< the number of sources. Its difficulties relate to handle several speakers, and it only operates in low noise environments.
A still further prior published work in the technical field of signal recognition relates to “Robust Adaptive Beamforming Based on the Kalman Filter”, Amr El-Keyi, Thiagalingam Kirubarajan, and Alex B. Gershman, IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 8, AUGUST 2005.
The paper presents a novel approach to implement the robust minimum variance distortion-less response (MVDR) beam-former. This beam-former is based on worst-case performance optimization and has been shown to provide an excellent robustness against arbitrary but norm-bounded mismatches in the desired signal steering vector. However, the existing algorithms to solve this problem do not have direct computationally efficient online implementations. In this paper a new algorithm for the robust MVDR beam-former is developed, which is based on the constrained Kalman filter and can be implemented online with a low computational cost. The algorithm is shown to have similar performance to that of the original second-order cone programming (SOCP)-based implementation of the robust MVDR beam-former. Also presented are two improved modifications of the proposed algorithm to additionally account for non stationary environments. These modifications are based on model switching and hypothesis merging techniques that further improve the robustness of the beam-former against rapid (abrupt) environmental changes.
Blind Beam-forming relies on passive speaker localization together with conventional beam-forming (such as the MVDR) where the number of sensors >< the number of sources. A problem related to it is such that it only operates in low noise environments due to the passive localization.