Field of the Invention
The present invention relates to a method for the simultaneous identification and quantification of two or more chemical compounds contained in a pool of two or more samples, wherein each sample of the pool of samples comprises at least one of the said two or more chemical compounds, wherein the said two or more samples are subjected to a sample equalization before being pooled, wherein sample equalization is carried out in such a way that the total concentration of the chemical compounds in each of the pooled samples is equal, wherein the pool of samples is subjected to an analytical measurement wherein each chemical compound generates at least one signal representative for the said chemical compound and an intensity of each signal is representative for an abundance of the said chemical compound, wherein the intensity of a first and second signal is representative for the abundance of respectively the first and second chemical compound in the first sample, and the intensity of a third and fourth signal is representative for the abundance of respectively a third and fourth chemical compound in the second sample, wherein respectively the first and third, and the second and fourth compound may be the same or different, according to the preamble of the first claim.
The present invention in particular relates to the field of omics, i.e. the simultaneous characterization and quantification of individual biological molecules present in a pool or a mixture of two or more biological samples, for example the characterization and quantification of proteins present in a pool or a mixture of two or more biological samples, or the characterization and quantification of lipids present in a pool or a mixture of two or more biological samples or any other class of biological molecules. Omics involves a.o. metabolomics, lipidomics, genomics and proteomics. The result of such omics reflect the structure, function and dynamics of a biological molecule and of the biological sample.
For example, for the identification and quantification of the biological molecules present in the pool of samples different analytical techniques exist, amongst which NMR spectroscopy, mass spectrometry, microarrays and next-generation sequencing are the most frequently used. To facilitate compound separation, identification and quantification, mass spectrometry may be coupled to liquid chromatography (LC), gas chromatography (GC) or capillary electrophoresis (CE), for example. Each method is typically able to identify a large number of different biomolecules or biomolecules features.
Description of the Related Art
The data generated in metabolomics, proteomics, lipidomics, genomics a.o. usually may be digitized spectra, or lists of the biomolecule levels involved in the respective omics technique. In the simplest form a matrix is generated, with rows corresponding to subjects—identified biomolecules of a certain class, for example peptides present in sample proteins or triglycerides present in lipids—and columns corresponding with biomolecules levels. Statistical programs are available for analysis of these data, for example principal components analysis and least squares regression. Once the molecular composition is determined, data reduction techniques can be used to elucidate patterns and connections.
The fact that in the above-mentioned analytical techniques, in particular in mass spectrometry and NMR, several samples may be pooled and measured in one single experiment, and the fact that a simultaneous identification and/or quantification of biological compounds of different samples may be carried out, benefits a direct statistical assessment, as all the samples of the pool or in other words all the measurements, are affected by the same amount of instrument variability.
Where a relative quantification of e.g. biological molecules is envisaged, labeling of the molecules prior to the analytical measurement gained popularity, because labeling allows multiplexing of samples, in other words pooling of multiple biological samples, so that biological molecules contained in multiple biological samples can be simultaneously quantified. For this purpose, several labeling methodologies have been developed, which can be subdivided in precursor labeling and isobaric labeling. Examples of precursor labeling include metabolic, enzymatic and chemical labeling strategies (Li et al 2012). Metabolic strategies, such as Stable Isotope Labeling by Amino acids (SILAC), are promising but still limited to cell cultures or small animals. As an alternative, both O16/O18 enzymatic exchanges as well as chemical isotope labeling approaches such as isotope coded affinity tags (see Lottspeich et al, ICAT) are developed.
The isobaric labeling strategy, for example, belongs to the chemical labeling subclass and is special since the different, yet intact labels have an equal mass, hence the term “isobaric”. Isobaric labels are popular in particular in proteomic research as these tags allow multiplexing of up to ten samples in one LC-MS run, which reduces measurement time and makes direct intra experiment comparison possible. The two commercially available kits are Tandem Mass Tags (TMT)(6-plex or 10-plex) and isobaric Tags for Relative and Absolute Quantification (iTRAQ) (4-plex or 8-plex). Both TMT and iTRAQ isobaric tags contain a reporter group and an amino-reactive group, spaced by a balancer group which generates an isobaric mass shift for all tags (Ross, 2004; Thompson 2003). The reactive group of the tag targets N-termini and free amino groups of lysine, so that nearly all digested peptides are labeled at least once. Relative quantification of the labelled and pooled peptides is achieved by the generation of a unique reporter ion upon fragmentation of the peptide precursor. Due to this demultiplexing, the signal intensities of these reporter ions in tandem mass spectra can be used for the determination of the relative expression difference of peptides in the multiplexed samples (Dayon 2008, Zhang 2010, Pichler 2011, Dephoure and Gygi 2012). This multiplexing not only reduces the LC-MS measurement time considerably, it also substantially reduces the variation in the quantification results (Gygi).
This labeling protocol, however, involves additional handling of the samples, which make this isobaric labeling strategy and labeling in general, prone to systematic effects at the level of the wet-lab. One of the most common handling errors, for example, are pipetting errors that occur when samples are pooled (Oberg and Mahoney, 2012) or errors in the determination of the protein concentration prior digestion. This type of inaccuracies can be remediated by data normalization.
To correct for such systematic errors, a plethora of data normalization methods have been developed that can be borrowed from micro-array, LC-MS or NMR data analysis (Ejigu et al 2013, Oberg and Mahoney 2012, Bolstad 2003). Algorithms like quantile normalization (Keshamouni 2005; Jagtap 2006) are often applied in isobaric labelled proteomic studies. Several software packages suited for isobaric labelled data, including Quant (Boehm 2007); IsobariQ (Artnzen 2010); Isobar (breitweiser 2011) use global normalization methods. Here, the intensity distributions of the measurements within a quantification channel are shifted by a constant amount such that the mean or median of the distribution is equal across the quantification channels. Another software package, i-tracker, was developed to establish an easy integration of quantitative information and peptide identification and to provide iTRAQ 4-plex reporter ion ratios (Shadford et al 2005).