Mass spectrometry is commonly used in protein chemistry and proteomics to identify polypeptides and to determine their relative abundance. Mass spectrometry is also used to test a sample for the presence of a known polypeptide and the relative abundance of it.
The application generally requires the following steps: (1) introduce a sample into a mass spectrometer (herein “MS”); (2) utilise the MS to scan the sample; and (3) compare the data acquired from the scan against a database containing information acquired from previous MS experiments, or from a database containing predicted sample mass information to test for the presence and/or abundance of the known (“target”) polypeptide in the sample.
Generally speaking, there are four modes by which a MS can be configured to scan and acquire data.
A first mode is full scan acquisition. In this mode, the scan acquires information on the mass/charge ratio (herein “m/z”) of all polypeptides introduced into the MS. This is exemplified by the method known as peptide mass fingerprinting (PMF). In the case of a low complexity mixture, such as a purified polypeptide, PMF is often sufficient to identify the polypeptide analyte by matching observed m/z values against expected theoretical values. However, a problem arises where the sample is a complex mixture of polypeptides such as serum or a cell/tissue lysate; as the m/z's of many polypeptides are detected in the scan, making it very difficult to identify a target polypeptide. This is particularly the case where the target polypeptide has a low relative abundance in the sample. Also the mass range over which the m/z of polypeptides can be accurately determined is limited leading to overlapping signals in complex samples. Suppression effects in the ionization process results in the loss of signal from some polypeptides.
To improve identification specificity, a second mode of MS known as tandem MS (MS/MS) can be conducted. In this case, an m/z ion obtained from a MS scan is selected and fragmented for example, by collision-induced dissociation (CID) with a gas. This produces a series of fragment ions that originated from a precursor ion. Coupling the m/z of the precursor ion with the m/z of the fragment ions increases identification specificity when the masses are compared against a sequence database as described above. Nonetheless, for complex samples this approach is limited to identifying approximately 5-15% of the spectra generated and amongst this are many false-positive identifications. [Keller, A., Nesvizhskii, A. I., Kolker, E., Aebersold, R. (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383-5392. Nielsen, M. L., Savitski, M. M., Zubarev, R. A. (2005) Improving polypeptide identification using complementary fragmentation techniques in Fourier transform mass spectrometry. Mol. Cell. Proteomics 4, 835-845.]
If the MS/MS data quality contains an ion series representative of each amino acid of the analyte polypeptide, the amino acid sequence can be readily determined from the spectra via de novo sequence analysis. However, in practice, data of this quality occurs at low frequency. To overcome the limitations of imperfect spectra the accepted approach is to utilize the imperfect MS/MS spectra as a signal and then filter through the database for those sequences containing the MS/MS signal. Two basic methods exist for this purpose, the first proposed by Yates and Eng is the cross-correlation method, and the second proposed by Mann is based on the related idea of sequence tag matching. The technical limitations of both these approaches and the larger methodologies that they have evolved into are that they ultimately assign a polypeptide identity and a concomitant P-value. The P-value is a measure of confidence that a human investigator would assign the same identity if manually inspecting (Nesvizhskii 2002 supra). Thus it is possible and even probable that spectra are generated by the MS that do not contain enough information to uniquely match them to a polypeptide sequence however they would still be scored well (false-positive). In net terms, these signal filtering techniques are unable to determine when an MS/MS spectra lacks sufficient information content to determine an identity, thus they are incapable of returning a negative result but instead leave it to the user to choose a cut off value of confidence in the database search result.
A third mode is single ion monitoring (SIM). SIM scans are performed by configuring a MS to scan for polypeptides having a selected m/z. While polypeptides not having the selected m/z are excluded from detection, SIM scans detect all polypeptides having a m/z that is indistinguishable from the target polypeptide m/z. Accordingly, where the sample contains polypeptides having a m/z that is the same as the target polypeptide (again, this is common where the sample includes a complex mixture of polypeptides), multiple peaks are presented in a plot of relative intensity against m/z, thereby confounding polypeptide identity. Again, the sensitivity of this mode becomes an issue where the target polypeptide has a low relative abundance relative to other polypeptides having the same m/z.
The fourth mode is selected reaction monitoring (SRM). In this mode, the MS is configured to scan for the presence of both a precursor m/z ion (typically known as a Q1 value) and a fragment ion (typically known as a Q3 value) that is generated when polypeptides having a particular precursor m/z are fragmented (e.g. by CID). Typically, both the Q1 and Q3 value are determined from a database containing information acquired from either previous MS experiments, or theoretical calculations (MIDAS). The combination of Q1 and Q3 ion m/z that map to a given polypeptide, enables the monitoring of polypeptide abundance.
A limitation of the SRM approach with complex samples is that many different combinations of polypeptides can occupy the same mass transmission window centred around Q1 and Q3 values, thus compromising the technique for polypeptide identification purposes. Therefore, unless a definitive MS scan can be conducted (or has been previously conducted) that contains information in addition to Q1 and Q3 values (such as obtained in a tandem MS scan) it is not possible to identify the analyte with any confidence using solely Q1 and Q3 values. This means that most if not all Q1, Q3 pairs for a given polypeptide will map to one or more other polypeptides, especially in the context of a complex mixture of polypeptides. For those polypeptides in a complex sample that are detectable, it is economically unattractive and experimentally cumbersome to perform MS experiments for every polypeptide to identify a fragment ion that will uniquely identify each polypeptide.
There is a need to be able to determine the presence and/or abundance of any given target polypeptide in a complex mixture of polypeptides, and especially those having low relative abundance.