The analysis and comparison of mixtures is an important task of analytical chemistry, especially in environmental sciences, biology, food industry and process chemistry. For example in the field of metabonomics, biofluids of animals and humans are characterized by the spectra obtained with established spectral methods such as Liquid Chromatography/Mass Spectroscopy (LC-MS) or Nuclear Magnetic Resonance (NMR) spectroscopy. It is often necessary to analyze and compare a whole set of spectra, e.g. a plurality of individual spectra obtained from a set of samples. To separate effects related to changes of overall concentrations of samples (changes of all analytes of samples, e.g. by dilution of samples) from effects influencing the compositions of samples (relative concentrations of components in the mixtures), it is necessary to to use so-called normalization procedures. Normalization is also needed if the data of various samples were taken under different experimental conditions.
Up to now, it was a common procedure in metabonomics studies—e.g. with urine samples—to normalize the signal in a given NMR spectrum so as to obtain a constant overall integral of said spectrum. This means that every NMR spectrum in a set of spectra is scaled to the same predefined area under the curve. The underlying assumption is that the integral of each spectrum is mainly a function of overall urine concentration. Variations of the concentration of individual analytes due to metabonomic responses are assumed to be relatively small in contrast to variations of overall urine concentrations, the latter of which affect the entire spectrum and said predefined area of the spectrum respectively. However, animals in metabonomic studies can excrete extreme amounts of substances like sugars, which may dominate the spectrum and consequently will substantially influence the normalization. In addition, drug related compounds that are excreted with urine may also influence the normalization through the integral of their corresponding peaks and may thus contribute significantly to the total integral of a spectrum. The same type of problem arises in other analytical applications comparing mixtures, where the appearance of an unknown contaminant with a comparatively high concentration might significantly affect the total integral of a spectrum or said predefined area of the spectrum.
A method for quantification of chemical mixtures components studied by mass spectroscopy is disclosed in US 2003/0111596 A1. As described particularly in Paragraph 0040 of said document, the known method relies on:    a) obtaining a set of sample spectra from a plurality of chemical samples, each spectrum comprising peaks having peak intensities;    b) selecting a reference spectrum;    c) for any of said sample spectra to be normalized, computing intensity ratios between the sample spectrum and the reference spectrum for all peaks or for a fraction of the total number of peaks; and    d) mutiplying the sample spectrum with a normalization factor that is computed from said intensity ratios.
The above method relies on the fact that under many practical circumstances the majority of said intensity ratios will be substantially equal, representing components whose concentrations do not vary between the sample and reference spectra. The normalization factor may then be computed from said intensity ratios using a non-parametric measure. Preferably, the normalization factor is chosen to be the median of said intensity ratios.
As further pointed out in Paragraph 0031 of US 2003/0111596 A1, the known normalization method is applicable to any type of spectroscopy or spectrometry yielding spectra containing signals (or peaks) whose intensities or areas are proportional to component concentrations. In particular, it should thus be applicable to NMR spectroscopy.
However, the method disclosed in US 2003/0111596 A1 does not address the problem of identifying and eliminating so-called “outliers”, which may notably be individual signals originating from or distorted by artifacts, but also entire spectra with some type of deviation, e.g. due to technical failure during acquisition. This problem is particularly important in quantitative analysis of large numbers of spectra, such as in metabonomics studies.