With gradual development of mass spectrum application in proteomics field, proteomic quantification, particular proteomic label free quantification, has been rapidly developed. The basic principle of label free quantification is to use data obtained by liquid chromatogram-mass spectrum technology for representing expression amount of peptide/protein in a sample. Such method mainly comprises two types:
One is performing quantification by directly using LC-MS/MS (also known as spectra counting). Since this type of quantification method has a relative higher requirement to resolution and sensitivity of mass spectrum, it has not been widely used as current mass spectrum is not able to accurately represent quantification information of peptide/protein.
The other one is performing quantification by using primary mass spectrogram of LC-MS (also known as counting). Such method quantifies a peptide (protein) based on area under curve (or intensity) of extracted ion chromatogram (XIC) which is constructed with primary mass spectrogram. As same or similar peptides (peak of isotope) which are initially separated using liquid chromatography after being digested, present in regions having similar retention times. The high concentration of peptide in the sample, the stronger response intensity of ion signal thereof, thus the result obtained by such method is relative accurate.
Said the other method may be performed by two ways: quantification without identification result (or known as quantification prior to identification) and quantification with identification result (or known as identification prior to quantification).
The quantification without identification result directly subjects information of primary mass spectrogram of LC-MS to peptide quantification and identification, which mainly comprises following steps of: 1) signal preprocessing and peak detection; 2) constructing XIC; 3) aligning retention times; 4) data normalization; 5) sequence matching of peptide/protein; 6) calculating a ratio of protein abundance value; and 7) statistical analysis. Such method is able to quantify more peptides and proteins, however, the quantified peptides have a high false positive error and a large interference of noise peak.
Since the last year, people prefer using quantification with identification result. The basic principle of such method is identifying peptide (protein) firstly by secondary mass spectrogram (LC-MS/MS), then aligning the identified peptide (protein) to corresponding ion peak in primary mass spectrogram (LC-MS), and then constructing a corresponding XIC for label free quantification. Thus, such method not only reduces the false positive error, but also improves accuracy of quantification, as well as decreases time-consumption. Said method comprises general steps of: 1) searching secondary mass spectrogram based on database, to identify peptide (protein) which are subjected to quality control; 2) for the identified peptide (protein), constructing corresponding XIC with primary mass spectrogram thereof; 3) calculating a ratio of abundance values of same protein in different samples; 4) subjecting the calculated result to statistical analysis. Although less peptides (protein) can be quantified using said method, the identified peptides (protein) is obtained by identifying directly using identification software, those ions of peptides in primary mass spectrogram which cannot be identified by the secondary mass spectrogram are not subjected to identification. Therefore, the quantified peptides (proteins) obtained by said method have a high accuracy, which greatly reduce time-consumption for analysis.
As advantages of the quantification with identification result itself, many software have already used said method so far, for example IDEAL-Q, pview, etc. These software all have advantages and disadvantages respectively.
For example, advantage of IDEAL-Q lies in using identification result for cross prediction, which significantly improve quantification coverage. However, triple validation method (isotope peak patter, charge status, signal to noise ratio), used by IDEAL-Q in quality control, is proper, because for the most advanced mass spectrometry LTQ-Orbtrap, which has a high accuracy, is no longer suitable to filter through signal to noise ratio, otherwise many real-existing ions of peptides in the primary mass spectrometry with relative lower signal intensity will be filtered out, which cannot reflect real-existing status of a certain peptide in sample. Besides, the method of calculating area under curve with the constructed XIC by IDEAL-Q is not accurate enough.
As another example, advantage of pview mainly lies in being able to simultaneously handle hundreds of samples; however it requires a large memory (at least 4G) without cross prediction and verification step.
In proteomic quantification, the method of quantifying peptides (proteins) with mass spectrometry usually comprises two steps of: calculating abundance value of peptide firstly, then calculating abundance value of protein.
The step of calculating abundance value of protein mainly uses a relationship between signal intensity of peptides in the primary mass spectrometry and abundance value of peptides, i.e., obtaining the abundance value of peptides by signal intensity of mass spectrometry.
The step of calculating abundance value of protein is to obtain abundance value (or ratio of abundance values) of proteins. The method of aligning the abundance value of peptide to abundance value of protein; will directly be related to accuracy of the step of calculating abundance value of protein.
Currently, commonly-used algorithm for aligning the abundance value of peptide to the abundance value of protein comprises: 1) for each protein, calculating a mean value of abundance values of all peptides in the protein, i.e., the abundance value of the protein, then calculating a ratio of abundance values of the protein between samples using the calculated abundance value of the protein; 2) for each protein, calculating a mean value of the first n peptides (n is an integer being more than 1, for example the first 3) having the maximal abundance values in the protein of the sample, i.e. the abundance value of the protein, then calculating a ratio of abundance value of the protein in samples using the calculated abundance value of the protein; 3) for each protein, calculating a ratio of abundance values of all peptides in the protein of samples firstly, then calculating a mean value or a weighted value of these ratios, being as a ratio of abundance value of the protein of samples.
Mass spectrometry of protein aims at peptides with different properties, even for one same peptide, the mass spectrometry will generate different signals of mass spectrometry under same condition. Thus, even peptides digested from one same protein, which theoretically have same concentrations, the obtained abundance value of mass spectrometry may be much different.
The above two algorithm of 1) and 2) both take a mean value of abundance values of all peptides or the first n of peptides having the maximal abundance values as the abundance value of protein, of which strategy obviously leads to a large derivation. The above algorithm 3) subjects the abundance values of all peptides in the proteins of different samples to alignment, of which disadvantage lies in being unable to providing the abundance value of the protein in single sample; and the disadvantage of the above two algorithm also presents by using all peptides.
Currently, in proteomic quantification (particularly label free quantification), a method which simultaneously improve quantification coverage and/or accuracy is still urgent needed.