In a number of applications it is of interest to extract pure compounds from the collections of their linear combinations also called mixtures. Quantification and identification of the components present in the mixture is a traditional problem in NMR, IR, UV, EPR and Raman spectroscopy, mass spectrometry, etc. Identification of the spectra of mixtures proceeds in majority of the cases by matching the mixture's spectra with a library of reference compounds. This approach is ineffective with the accuracy strongly dependent on the library's content of the pure component spectra. In addition to that, for a number of compounds isolated from natural sources or obtained in proteomics- and metabolomics-related studies there is no library of pure components available yet.
As opposed to the previous library-based approach it has been repeatedly demonstrated over the last ten years the possibility to separate mixture's spectra into pure component spectra employing the methodology known as blind source separation (BSS) that uses only the measurements of the mixture's spectra. Two widely spread methods in this domain are independent component analysis (ICA) and nonnegative matrix factorization (NMF). ICA belongs to group of statistical methods for solving blind linear inverse problems. Assumptions upon which the ICA algorithms are built are that unknown pure components are statistically independent and non-Gaussian, as well as that the number of linearly independent mixtures is greater than or equal to the number of pure components. NMF belongs to the group of algebraic methods for solving linear inverse problems. It also requires that the number of linearly independent mixtures is greater than or equal to the number of pure components as well as that pure components are nonnegative and sparse. Nonnegativity requirement and sparseness requirement are not satisfied simultaneously in a majority of spectroscopic applications. The general principle of blind extraction of pure components employing the BSS approach is schematically shown in FIG. 1 that will be discussed below.
One of the most known ICA algorithms is described in the U.S. Pat. No. 5,706,402 (B2), patent application WO 9617309 (A), as well as in the paper: A. J. Bell and T. J. Sejnowski. An information-maximization approach to blind separation and blind deconvolution. Neural Computation; vol. 7, pp. 1129-1159, 1995. Reference literature for the field of blind source separation and independent component analysis are: A. Hyvärinen, J. Karhunen, E. Oja. Independent Component Analysis, John Wiley, 2001; A. Cichocki, S. Amari. Adaptive Blind Signal and Image Processing, John Wiley, 2002.
We point out here that two assumptions made by standard BSS methods: (i) the number of linearly independent mixtures is greater or equal to the unknown number of pure components; (ii) the pure components are statistically independent, are not easily and always met in real world applications in spectroscopy and spectrometry. The first assumption implies that concentrations of the pure components in different mixtures are different. This is not always easy to meet in practice. Therefore a methodology for blind decomposition of pure components from as few mixtures as possible is of great practical importance. The second assumption implies a small level of overlapping between the pure components. This is known not to be the case in a number of occasions. Few examples include 1H NMR spectroscopy, EPR spectroscopy, UV and IR spectroscopy, but also homo- and heteronuclear 2D NMR spectroscopy of complex chemical compounds and biomolecules such as proteins, enzymes, glycoproteins, nucleic acids, etc.
As described below in paragraphs, [0009]-[0014], BSS methods, mostly ICA, are used to extract pure components from the plurality of the spectroscopic or spectrometric signals. In a number of occasions it is emphasized that statistical independence among the pure components is not a correct assumption in spectroscopy and spectrometry. What is in common to the BSS methods to be elaborated is that number of linearly independent mixtures is required to be greater than or equal to the unknown number of pure components.
Review of application of ICA in signal processing for analytical chemistry is given in: G. Wang, Q. Ding, Z. Hou, “Independent component analysis and its applications in signal processing for analytical chemistry,” Trends in Analytical Chemistry, vol. 27, No. 4, 368-376, 2008.
The BSS based approach to blind decomposition of the NMR spectra is presented in: D. Nuzillard, S. Bourg and J.-M. Nuzillard, “Model-Free Analysis of Mixtures by NMR Using Blind Source Separation,” Journal of Magnetic Resonance 133, 358-363, 1998; D. Nuzillard, J.-M. Nuzzilard, “Application of Blind Source Separation to 1D and 2D Nuclear Magnetic Resonance Spectroscopy,” IEEE Signal Processing Letters, vol. 5, No. 8, 209-211, 1998; K. Stadlthanner, et al. “Separation of water artifacts in 2D NOESY protein spectra using congruent matrix pencil,” Neurocomputing 69, 497-522, 2006. Employed BSS methodologies assumes: (i) that the number of linearly independent mixtures is greater or equal to the unknown number of pure components; (ii) the pure components are statistically independent. Statistical independence assumption has been relaxed in: W. Naanaa, J.-M. Nuzzilard, “Blind source separation of positive and partially correlated data,” Signal Processing 85, 1711-1722, 2005. However it is still required that the number of linearly independent mixtures is greater than or equal to the unknown number of pure components.
The use of ICA and mean filed ICA in blind decomposition of the signals in gas chromatography-mass spectrometry (GC-MS) is elaborated respectively in: X. Shao, G. Wang, S. Wang, Q. Su, “Extraction of Mass-Spectra and Chromatographic Profiles from Overlapping GC/MS Signal with Background,” Analytical Chemistry 76, 5143-5148, 2004; G. Wang, W. Cai, X. Shao, “A primary study on resolution of overlapping GC-MS signal using mean-field approach independent component analysis,” Chemometrics and Intelligent Laboratory Systems 82, 137-144, 2006. The later reference elaborates a method for blind decomposition of statistically dependent spectrometric signals. However, it is still required that the number of linearly independent mixtures is greater than or equal to the unknown number of pure components.
Blind decomposition of the EPR mixture spectra is introduced in: J. Y. Ren, et al., “Free radical EPR spectroscopy analysis using blind source separation,” Journal of Magnetic Resonance 166, 82-91, 2004. The standard ICA algorithm (FastICA) has been applied for blind separation of the EPR spectra. In the following reference it has been however realized that pure components in EPR spectroscopy are not statistically independent as well as that EPR spectra are sparse: C. Chang et al., “Novel sparse component analysis approach to free radical EPR spectra decomposition,” Journal of Magnetic Resonance 175, 242-255, 2005. Sparseness has been used to cope with statistical dependence problem among the pure components and novel contrast function that measures sparseness of the EPR spectra is proposed in this reference. However, the number of mixtures is still required to be greater than or equal to the number of pure components.
The use of latent variable analysis, specifically non-negative ICA, for blind decomposition of Raman spectra is elaborated in: V. A. Shashilov et al., “Latent variable analysis of Raman spectra for structural characterization of proteins,” Journal of Quantitative Spectroscopy & Radiative Transfer 102, 46-61, 2006. Non-negative ICA took into account non-negativity of the variables in the assumed linear mixture model but still the number of mixtures was required to be greater or equal to the unknown number of pure components.
ICA has been applied to IR spectral data analysis in: J. Chen, X. Z. Wang, “A New Approach to Near-Infrared Spectral Data Analysis Using Independent Component Analysis,” J. Chem. Inf. Comput. Sci. 41, 992-1001, 2001. It is however known that pure components in the spectral domain are statistically dependent: J. M. P. Nascimento, J. M. Bioucas Dias, “Does Independent Component Analysis Play a Role in Unmixing Hyperspectral Data?,” IEEE Transactions on Geoscience and Remote Sensing 43, 175-187, 2005. Since statistical independence among the pure components is the obligated condition for the ICA to work, the ICA approach to IR spectra decomposition has limited accuracy. In addition to that, the number of spectral measurements (mixtures) is still required to be greater than or equal to the unknown number of pure components.
Paragraphs, [0015]-[0033], discuss patents and patent applications related to BSS concepts that fall into two categories: those that are claimed for applications in spectroscopy and spectrometry and those that solve the BSS problem using two mixtures only. The methods of the first category still require the number of mixtures to be greater than or equal to the number of pure components. The methods of the second category are based on assumptions made on the structure of the source signals that are specific to application domain (voice signals) what disables their applicability in the fields of spectroscopy and spectrometry.
The US patent application 20040111220 “Methods of decomposing complex data” presents a method for blind decomposition of the mixture matrix that is a statistically based data mining technique. It claims applications in spectroscopy, spectrometry, genomics, proteomics, etc. It however requires the number of mixtures to be greater than the number of the unknown components. This is evident at the first stage of the algorithm where principal component analysis (PCA) is used to remove outlier and noisy components from data. This is done by inspecting eigenvalues of the data covariance matrix wherein the overall number of eigenvalues equals the number of mixtures. Thus, this method can not work when number of mixtures is smaller than number of pure components.
The US patent application 20070252597 “Magnetic resonance spectroscopy with sparse spectral sampling and interleaved dynamic shimming” is related to 4D (three spatial and one spectral dimension) magnetic resonance spectroscopy and is characterized by sparse sampling across spectral dimension. Here sparseness of the components is a consequence of the multidimensionality of the data, i.e. sensing device.
The patent application WO2007138544 “Coding and decoding: seismic data modeling, acquisition and processing” presents a method for blind decomposition of seismic data. In said application uBSS problem is converted to determined problem generating new equations by means of higher order statistics. This is however specific for the seismic data processing domain only.
The patent application CN1932849 “Initial method for image independent component analysis” exploits sparseness of the data in wavelet domain in order to obtain more accurate estimate of the mixing matrix. The estimate of the mixing matrix is then used as the initial condition for standard ICA algorithms. Thus, said application is essentially related to even- or over-determined BSS problems that require the number of mixtures to be greater than or equal to the number of pure components.
The patent application WO2007112597 “Blind extraction of pure component mass spectra from overlapping mass spectrometric peaks” is related to blind extraction of the pure components from recorded multicomponent gas chromatography-mass spectrometric signals (mixtures) by means of entropy minimization approach. It also estimates the unknown number of the pure components based on the ranking of the singular values of the sample data covariance matrix and discarding the small singular values that are attributed to chemical noise. Thus, said application ultimately requires the number of mixtures to be greater than the unknown number of pure components.
The U.S. Pat. No. 7,295,972 “Method and apparatus for blind source separation using two sensors” is related to a novel algorithm for blind extraction of multiple source signals from two mixtures only. The method transforms mixtures into frequency domain and employs the strategy that is similar to famous DUET algorithm (Blind Separation of Disjoint Orthogonal Signals: Demixing n sources from 2 mixtures, by A. Jourjine, S. Rickard, and O. Yilmaz, in Proc. Int. Conf. on Acoust., Speech, Signal Processing, 2000, vol. 5, pp. 2985-2988) where specific assumption on disjoint orthogonality is made. The requirement of this assumption is that only one source signals exist at the point in the time-frequency plane. This assumption is very restrictive and seems to be approximately true for the voice signals only. Thus said method is not applicable to the field of spectroscopy and spectrometry where pure components exist simultaneously in time and frequency (few examples include 1H NMR and EPR signals).
The U.S. Pat. No. 7,280,943 “Systems and methods for separating multiple sources using directional filtering,” is related to semi-blind extraction of multiple source signals from one or more received signals. The method is semi-blind because it assumes that each source signals can be represented by a set of known basis functions and directional filters that incorporate prior knowledge on the type of the sources and their directions of arrival. The last assumption surely does not hold when spectroscopy and spectrometry are considered as application domains. This is because the signals arising in spectroscopy and spectrometry do not have spatial structure, i.e. there are no distinct spatial locations to which the pure component signals can be associated and there are no distinct spatial locations of the receiving sensors (the multiple mixtures are acquired over different time slots or different wavelengths).
The U.S. Pat. No. 7,010,514 “Blind signal separation system and method, blind signal separation program and recording medium thereof” presents a solution of the BSS problems, including uBSS problem, using probabilistic approach known as maximum likelihood (M. S. Lewicki et. al., “Learning Overcomplete Representations,” Neural Computation, vol. 12, pp. 337-365, 2000.). It is assumed in the patent that the number of sources (also called pure components) is known. This is a first significant limitation of said patent. Probabilistic maximum likelihood approach implies that prior distribution of the unknown pure components is known in order to obtain the learning equation for the unknown mixing matrix. Because related uBSS problem can be solved only if sources have proper degree of sparseness this implies that problem must be transformed into the basis with enough degree of sparseness. Then, in order to obtain mathematically tractable learning rule for the mixing matrix, the Laplacian distribution is assumed for the prior distribution of the sources in the given basis. This is a second significant limitation of said patent. In practice we can not dictate distribution of the sources in the chosen basis because the number of available bases is limited and most frequently used basis, such as Fourier or wavelet basis, do not represent all types of signals with the same degree of sparseness. Therefore assumed Laplacian distribution of the sources will in reality deviate from the true distribution and this will be the source of errors in estimation of the mixing matrix.
The U.S. Pat. No. 6,944,579 “Online blind source separation,” aims to extract multiple source signals from two mixtures only. The method transforms mixtures into time-frequency domain and employs the strategy of the algorithm published in: Blind Separation of Disjoint Orthogonal Signals: Demixing n sources from 2 mixtures, by A. Jourjine, S. Rickard, and O. Yilmaz, in Proc. Int. Conf. on Acoust., Speech, Signal Processing, 2000, vol. 5, pp. 2985-2988. The specific request of patented algorithm is that source signals are disjointly orthogonal in time-frequency plane. It is empirically known that this assumption is fulfilled for the voice signals. However, there is no rational to believe that it will be fulfilled for arbitrary type of signals such as for example those that arise in the fields of spectroscopy or spectrometry. The reason is that pure components residing in the spectroscopic mixture signals are active simultaneously in time and frequency. Hence, said method is not applicable to the fields of spectroscopy or spectrometry.
The U.S. Pat. No. 6,577,966 “Optimal ratio estimator for multisensor system,” aims to extract multiple source signals from two mixtures only. Separation method based on optimal ratio estimation is possible provided that source signals do not overlap in time-frequency domain. As already commented this assumption approximately holds for the voice-type of signals and the purpose of said method is separation of multiple voice signals from two-microphone recordings. As already discussed in the previous paragraph it is not realistic to expect for arbitrary type of signals, such as those arising for example in the fields of spectroscopy of spectrometry, not to overlap in time-frequency plane. The reason is that pure components residing in the spectroscopic mixture signals are active simultaneously in time and frequency. Hence, said method is not applicable to the fields of spectroscopy or spectrometry.
The US Patent Application 20070257840 “Enhancement Techniques for Blind Source Separation,” is related to improving performance of the BSS algorithms for separation of audio signals from two microphone recordings. Decorrelation based pre- and post-filtering (least means square filtering) is applied to the first and second microphone signals for the enhancement purpose. The method assumes that a first microphone is in the proximity of a first source signal and a second microphone is in the proximity of a second source signal. In this sense the known method is very limited and can not be applied to the field of spectroscopy and spectrometry where mixtures are obtained over time or wavelength (there is no plurality of the physical sensors) and more than two sources (pure components) exist.
The US patent application 20060064299 “Device and method for analyzing an information signal,” is related to extraction of multiple audio signals from single mixture. The method splits the mixture into plurality of component signals and finds information content of each component signal based on calculation of their features; wherein feature is defined so that it is correlated with two source signals in two different subspaces. The features are audio signal specific and that is what limits this patent application to separate audio signals only. Hence, the algorithm presented in cited patent application is not applicable to the type of signals that arise in the fields of spectroscopy and spectrometry.
The US patent application 20060058983 “Signal separation method, signal separation device, signal separation program and recording medium,” presents a signal separation algorithm capable to separate multiple source signals from multiple mixtures wherein the number of sources can be greater than the number of mixtures. The algorithm relies on standard concept when dealing with uBSS problems: transforming mixtures into frequency domain, performing data clustering to estimate number of sources and performing frequency domain ICA at those frequencies where two or more sources are active. Thus, the algorithm in cited patent applications has the following deficiencies: (i) the number of sensors must be greater than two if more than two sources are active at the same frequency; (ii) in relation to comment (i) Fourier basis (frequency domain), that is used by the cited application, is not optimal for the type of signals that arise in spectroscopy.
The US patent application 20050032231 “Identifying component groups with independent component analysis,” presents ICA based solution for blind decomposition of multivariate spectrometric data. The solution of the cited application has the following deficiencies: (i) since the blind decomposition problem is solved by ICA, the number of mixtures must be greater than or equal to the unknown number of pure components; (ii) since ICA is used to solve blind decomposition problem, pure component must be statistically independent what is known not to be generally true for pure components arising in spectrometry: G. Wang et. al., “A primary study on resolution of overlapping GC-MS signal using mean-field approach independent component analysis,” Chemometrics and Intelligent Laboratory Systems 82, 137-144, 2006; W. Naanaa, J.-M. Nuzzilard, “Blind source separation of positive and partially correlated data,” Signal Processing 85, 1711-1722, 2005. Hence, the algorithm presented in cited application can not separate more than two spectroscopic signals that are statistically dependent using two mixtures only.
The US patent application 20030088384 “Chemical substance classification apparatus, chemical substance classification method, and program” presents an ICA based solution for blind decomposition of multivariate chemical substance data. The same comments apply as in relation to the previously cited US patent application 20050032231.
The patent application WO2008076680 (US2008147763) “Method and Apparatus for Using State Space Differential Geometry to Perform Nonlinear Blind Source Separation,” presents quite general state space differential geometry based approach to nonlinear blind source separation. The set of application domains covered by claims is quite wide. The main assumption of the algorithm proposed in the cited application is that the number of mixtures that contain possibly nonlinear combinations of the pure component signals is greater than or equal to the number of pure components as well as that pure component signals are statistically independent. Hence, algorithm presented in the cited application can not separate more than two spectroscopic signals that are statistically dependent using smaller number of mixtures.
The patent application WO2007103037 (US2007004966) “System and Method for Generate a Separated Signal,” applies a concept of independent vector analysis to separate multiple source signals from multiple mixtures, whereas the number of mixtures must be greater than or equal to the number of source signals. Hence, the algorithm presented in the cited application can not separate more than two spectroscopic signals using smaller number of mixtures.
The patent application US2006256978 “Sparse signal mixing model and application to noisy blind source separation,” presents an algorithm for blind extraction of two or more signals from two mixtures only by transforming measured signals into time-frequency domain. The fundamental assumption made on the two source signals is that they are disjointly orthogonal, i.e. that at each time-frequency location only one source signal exists. This assumption is quite restrictive and even in the cited application it is stated that it approximately holds for voice signals only. The known method will not work in the case of spectroscopic signals, because the pure components are simultaneously active in time and frequency.
The patent application WO03090127 “Blind source separation utilizing a spatial fourth order cumulant matrix pencil,” relates to novel method for blind separation of again statistically independent sources relying on fourth-order cumulants and generalized eigen-analysis. Said method suffers from the same limitations as mentioned above, namely (i) sources must be statistically independent and (ii) the number of mixtures must be equal to or greater than the number of sources.
The International patent application number PCT/HR2008/000037 relates to a method of and system for blind extraction of more than two pure components out of spectroscopic or spectrometric measurements of only two mixtures by means of sparse component analysis. Said known method for blind extraction of more than two pure components out of two mixtures that is based upon assumption that pure components do not overlap either in original recording domain or in some transformed domain. However, in the case of NMR spectroscopy it is practically impossible to satisfy no-overlap assumption when pure components represent complex chemical compounds such as those that arise in analyses of biological fluids (urine, blood plasma, cerebrospinal fluid, saliva, amniotic fluid, bile, tears, etc.) that include determination of certain metabolites or biomarkers.
Accordingly, it is the aim of the present invention to provide a method and system for blind extraction of more pure components than mixtures in 1D and 2D NMR spectroscopy and mass spectrometry, with particular emphasize to the cases when pure components represent complex chemical compounds such as those that arise in analyses of biological fluids (urine, blood plasma, cerebrospinal fluid, saliva, amniotic fluid, bile, tears, etc.) that include determination of certain metabolites or biomarkers or when great number (from few hundreds up to few thousands) of pure components is contained in the mixtures.