Many molecules are fragmented by chemical, electrical (electron beam or field induced collisions with neutral gas molecules), or optical (excimer lasers) means in mass spectrometers so that the masses of the resulting labeled ion fragments can be used to identify or reconstruct the original molecule. In other instances molecules may coelute from separation processes to be further distinguished by mass spectrometry. In some instances a label is attached to the parent molecule, or specific molecules in a mixture, to assist in the identification of the resulting labeled ions or ion fragments from other chemical noise in the mass spectrum. Typically, this label consists of elements, or isotopes of elements, already contained in the parent molecule. In this way two or more peaks of predetermined relative abundances can be found in the mass spectrum and used to confirm the identify of labeled fragments. However, when the label contains elements (or isotopes of these elements) already contained in the parent molecule or in other ions generated from or otherwise contaminating the sample matrix, one or more of the labeled fragment peaks may overlap with other unlabeled ion peaks in the spectrum, confounding identification of the labeled ions. Historically, techniques such as Edman degradation have been extensively used for protein sequencing. However, sequencing by collision-induced dissociation mass spectrometry (MS) methods (MS/MS sequencing) has rapidly evolved and has proved to be faster and require less protein than Edman techniques.
MS sequencing is accomplished either by using higher voltages in the ionization zone of the MS to randomly fragment a single peptide isolated from a protein digest, or more typically by tandem MS using collision-induced dissociation in the ion trap. Several techniques can be used to select the peptide fragment used for MS/MS sequencing, including accumulation of the parent peptide fragment ion in the quadrapole MS unit, capillary electrophoretic separation coupled to ES-TOF MS detection, or other liquid chromatographic separations. The amino acid sequence of the peptide is deduced from the molecular weight differences observed in the resulting MS fragmentation pattern of the peptide using the published masses associated with individual amino acid residues in the MS, and has been codified into a semi-autonomous peptide sequencing algorithm.
For example, in the mass spectrum of a 1425.7 Da peptide (HSDAVFITDNYR) (SEQ ID NO: 25 isolated in an MS/MS experiment acquired in positive ion mode, the difference between the full peptide 1425.7 Da and the next largest mass fragment (y11, 1288.7 Da) is 137 Da. This corresponds to the expected mass of an N-terminal histidine residue that is cleaved at the amide bond. For this peptide, complete sequencing is possible as a result of the generation of high-abundance fragment ions that correspond to cleavage of the peptide at almost every residue along the peptide backbone. In the above-recited peptide sequence, the generation of an essentially complete set of positively-charged fragment ions that includes either end of the peptide is a result of the basicity of both the N- and C-terminal residues. When a basic residue is located at the N-terminus and/or C-terminus, most of the ions produced in the collision induced dissociation (CID) spectrum will contain that residue since positive charge is generally localized at the basic site. The presence of a basic residue typically simplifies the resulting spectrum, since a basic site directs the fragmentation into a limited series of specific daughter ions. Peptides that lack basic residues tend to fragment into a more complex mixture of fragment ions that makes sequence determination more difficult.
Nucleic acid sequencing has historically been conducted through the synthesis of nucleic acid fragments containing random numbers of bases copied from a parent nucleic acid sequence, such as the methods defined by Sanger and Colson (Proc. Natl. Acad. Sci. (USA), 74:5463-5467 (1977)) and Maxam and Gilbert (Methods in Enzymology, 65:499-560 (1980)). A variation on the method described by Sanger and Colson uses an incomplete polymerase chain reaction (PCR) method to synthesize the ladder of DNA fragments (Nakamaye et al., Nuc. Acids Res., 16(21):9947-9959 (1988)). Mass spectrometric methods have been developed for more rapid and multiplexed separation and identification of the DNA ladders, as described by Koster (U.S. Pat. No. 5,691,141 and U.S. Pat. No. 6,194,144), Monforte et al. (U.S. Pat. No. 5,700,642), and Butler, et al (U.S. Pat. No. 6,090,558). In these methods the nucleic acid fragments are introduced simultaneously into the mass spectrometer and the sequence or number of “short tandem repeats” are deduced from the mass differences between individual elements of the synthesized mass fragment ladder. As described by Koster (U.S. Pat. No. 6,194,144), it is both possible and desirable to sequence several nucleic acids simultaneously in parallel by differentially labeling the nucleic acid fragments synthesized from unique nucleic acid parent templates with different tags of sufficiently unique masses. Even using labels of unique mass, care must be given to avoid subfragmentation of the elements of the sequence ladder during ionization or ion transmission in the mass spectrometer, and to purify the nucleic acid fragments from other extraneous nucleic acids and confounding matrix contaminants so that an unambiguous sequence can be obtained from the resulting mass spectrum. These references are incorporated by reference in their entirety for all purposes.
Polysaccharide sequencing methods, utilizing mass tagging methods in the mass spectrometer have also been described by Rademacher et al. (U.S. Pat. No. 5,100,778) and Parekh and Prime (U.S. Pat. No. 5,667,984). In these methods a unique mass tag is attached to a purified polysaccharide sample, which is subsequently divided into aliquots that are subjected to different regimes of enzymatic and/or chemilytic cleavage to produce a series of labeled oligosaccharide fragments derived from the polysaccharide parent. These fragments are simultaneously introduced into a mass spectrometer and the sequence of sugars contained in the parent polysaccharide determined from the resulting mass ladder generated in the mass spectrum from the random labeled oligosaccharide fragments. It is recognized that increased throughput may be obtained by processing several different samples simultaneously in parallel through the use of different mass tags attached to each unique purified polysaccharide parent sample. Again, care must be taken with the oligosaccharide samples to avoid subfragmentation in the mass spectrum and to purify the labeled fragments from unlabeled oligosaccharide contaminants to avoid sequencing ambiguities. These references are incorporated by reference in their entirety for all purposes.
Identification of the fatty acid composition and placement in lipids can be an important indicator of the state of a cell. For example, Oliver and Stringer (Appl. Environ. Microbiol., 4:461 (1984)) and Hood et al. (Appl. Environ. Microbiol., 52:788 (1986)) both report a 99.8% loss of phospholipids on starvation of Vibrio sp. Cronan (J. Bacteriol., 95:2054 (1968)) found 50% of the phosphotidyldglycerol content of Escherchia coli K-12 were converted to cardiolipin within 2 hours of the onset of phosphate starvation and that the fatty acid composition also shifted significantly. The lipid composition of the cell membrane is also of medical interest because of its potential roles in drug and metabolite uptake, anchoring transmembrane proteins, virial recognition of cell surfaces, tumor proliferation and metastasis, and arterial disease.
Similar mass tag approaches have been described for the identification of individual components of combinatorially-synthesized chemical libraries by Sugarman et al. (U.S. Pat. No. 6,056,926) and Brenner et al. (Proc. Natl. Acad. Sci. (USA), 89:5381-5383 (1992)), where a unique mass tag label is concurrently synthesized with the chemical compound of interest on a solid surface and later used to identify the various processing steps applied to the solid surface. This mass label can be identified after cleavage from the solid surface by mass spectrometry. The limitation on the size of the library that can be produced via combinatorial approaches is the number of unique mass labels that can be generated and the ability to discriminate these labels from the compounds of interest. These references are incorporated by reference in their entirety for all purposes.
Ness et al. (U.S. Pat. No. 6,027,890), Schmidt et al. (WO99/32501), and Aebersold et al. (WO00/11208) all describe methods for differentially labeling biological molecules obtained from different sources with a different mass tag for each source. The samples may then be combined, post labeling, and processed together through separation reactions or affinity enrichment, such that individual compounds from each sample are assured to be treated identically in the mixture. The relative concentrations of individual differentially-labeled biological compounds are then determined by the relative abundances of the individual mass tags in the mass spectrum. Limitations on these methods are that the mass labels employed must behave virtually identically with respect to any processing of the sample mixture and ionization and transport of the resulting ions in the mass spectrometer. For this reason, labels are typically chosen that are chemical analogs (e.g., stable isotope analogs or are simple derivatives of one another). A limitation of these methods is the number of samples that can be commingled for a single parallel analysis, which is limited by the number of mass tag derivatives that can be synthesized with nearly identical separation behaviors and ionization and transmission efficiencies. Another limitation of these methods is the ability to distinguish the mass labeled molecules or cleaved labels from unlabeled biomolecules and matrix contaminants that may also be present in the sample introduced into the mass spectrometer. This latter limitation often means that the labeled sample must be extensively purified prior to mass spectral analysis and that subfragmentation of the labeled molecules in the mass spectrometer must be avoided.
Schmidt et al. (WO 99/32501 (Jul. 1, 1999)) describe the use of fluorine (F) in place of hydrogen as a distinguishable mass defect element in cleavable mass labels. The basis of this claim is the 0.009422 amu monoisotopic mass difference between these two elements. However, this claim has several critical limitations. First, this is a very small mass difference, which can only be resolved with very high mass resolution mass spectrometers and at the lowest mass ranges in these mass spectrometers. The resolution of mass spectrometers depends on the mass range and is normally quoted in parts per million. For example, typical time-of-flight detectors common in the industry have a mass resolution of about 10 amu at a mass of 1 million amu (10 ppm). Therefore, as shown in Figure AA, the comparatively small mass difference between F and H is impossible to resolve above a mass of about 940 amu, and from a practical perspective at an even lower m/z.
Schmidt et al. further note that the mass defect of perfluorinated hydrocarbons can be distinguished from simple hydrocarbons. For example, the monoisotopic mass of a polyfluorinated aryl tag with a maximum stoichiometry of C6F5 is exactly 166.992015 amu. The monoisotopic mass of the closest hydrocarbon is 167.179975, which corresponds to the a stoichiometry of C12H23 and an easily resolvable mass difference of about 1125 ppm. The mass of the minimum polyfluorinated aliphatic tag is 68.995209 amu, which corresponds to a CF3 stoichiometry. The closest monoisotopic hydrocarbon mass to this is 69.070425, corresponding to a C5H9 stoichiometry and a difference of 1089 ppm.
However, for organic molecules that include heteroatoms, such as N and O, which are typical in biological molecules, the mass defect of fluorine is not as easily distinguished. For example, any molecule that contains a stoichiometry of C3HO2 will have a monoisotopic mass that is only 35 ppm different from that of CF3, making it nearly indistinguishable even at 69 amu. Similarly, any molecule that contains a monoisotopic stoichiometry of C7H3O5 is only 36 ppm different from C6F5 at 167 amu.
When the stable isotopes of C, N, and O are included in the calculations, the mass defect of C6F5 reduces to an indistinguishable 1.4 ppm when compared to a molecule that contains a stoichiometry of [12C]4[13C]2[15N]3[16O]2. Similarly, the mass defect for CF3 reduces to an mere 29 ppm compared to a molecule that contains [12C]2[13C][16O]2 stoichiometry. As the overall mass of the tag increases beyond 200 amu, the mass defect introduced even with multiple fluorines rapidly becomes indistinguishable among the defects of the other heteroatoms and stable isotopes. Adding even more fluorines to the molecule is often not practical due to solubility constraints.
The general problem of deconvolving individual peaks of interest from complex mass spectral data has been previously described for complex mixtures of small molecules (see Mallard, G. W. and J. Reed, “Automated Mass Spectral Deconvolution & Identification System, AMDIS-User Guide” (US Department of Commerce, Gaithersburg, Md., 1997) and Stein, S. E., “An integrated method for spectrum extraction and compound identification from GC/MS Data,” J Am Soc Mass Spect, 10:770-781 (1999)), particularly when coupled to time resolved separation methods (e.g., GC/MS and LC/MS). However, these techniques have not been applied to biopolymer (e.g., protein, nucleic acid, and polysaccharide) fragmentation spectra for the purpose of sequence determination. In fact, these methods typically attempt to identify the intact chemical species and generally seek to avoid fragmenting conditions in the ms. Nor, have they been coupled to the identification of labeled biomolecular ions containing unique mass tags.
Extending the concept of simplifying the CID spectrum of a peptide by including a charge concentrating moiety on either terminus of the peptide, others have demonstrated that attaching a hard positive charge to the N-terminus directs the production of a complete series of N-terminal fragment ions from a parent peptide in CID experiments regardless of the presence or absence of a basic residue at the N-terminus. Theoretically, all fragment ions are produced by charge-remote fragmentation that is directed by the fixed-charged group.
Peptides have been labeled with several classes of fixed-charge groups, including dimethylalkylammonium, substituted pyridinium, quaternary phosphonium, and sulfonium derivatives. Characteristics of useful labels include, ease of synthesis, increase in ionization efficiency of labeled peptides, and formation from a labeled peptide of a specific fragment ion series with minimal unfavorable label fragmentation. Zaia reported that the labels satisfying these criteria include those of the dimethylalkylammonium class and quaternary phosphonium derivatives. Moreover, it has been reported that substituted pyridinium derivatives are useful in high-energy CID.
Despite some progress in analytical methodology, protein identification remains a major bottleneck in field of proteomics. For example, it can require up to 18 hours to generate a protein sequence tag of sufficient length to allow the identification of a single purified protein from its predicted genomic sequence. Moreover, although unambiguous protein identification can be attained by generating a protein sequence tag (PST), limitations on the ionization efficiency of larger peptides and proteins restrict the intrinsic detection sensitivity of MS techniques and inhibit the use of MS for the identification of low abundance proteins. Furthermore, limitations on the mass accuracy of time of flight (TOF) detectors can also constrain the usefulness of presently utilized methods of MS/MS sequencing, requiring that proteins be digested by proteolytic and/or chemolytic means into more manageable peptides prior to sequencing. In addition, previously described MS ladder sequencing algorithms fail on proteins because the abundance of peptide fragments generated during CID of such large molecules and inability to identify an appropriate parent ion to initiate the sequence effectively obscure the mass ladder.
Two basic strategies have been proposed for the MS identification of proteins after their separation from a protein mixture: 1) mass profile fingerprinting (‘MS fingerprinting’); and 2) sequencing of one or more peptide domains by MS/MS (‘MS/MS sequencing’). MS fingerprinting is achieved by accurately measuring the masses of several peptides generated by a proteolytic digest of the intact protein and searching a database for a known protein with that peptide mass fingerprint. MS/MS sequencing involves actual determination of one or more PSTs of the protein by generation of sequence-specific fragmentation ions in the quadrapole of an MS/MS instrument.
Clauser et al. have suggested that proteins can only be unambiguously identified through the determination of PSTs that allow reference to the theoretical sequences determined from genomic databases. Li et al. appear to have proven this assertion by finding that the reliable identification of individual proteins by MS fingerprinting degenerated as the size of the comparative theoretical peptide mass database increased. Li et al. also reported that they were only able to obtain peptide maps for the highest abundance proteins in the gel because of sensitivity limitations of the MS, even though their matrix assisted laser desorption MALDI methodology was demonstrated to improve the detection sensitivity over previously reported methods. Clearly, rapid and cost effective protein sequencing techniques will improve the speed and lower the cost of proteomics research. Similarly, as described by Koster, the preparation and purification of nucleic acids prior to sequencing, even by mass spectrometers, increases the time and cost of nucleic acid sequencing. Improving the discrimination ability of the mass spectrometer, such that multiple protein, nucleic acid, polysaccharide or other sequences can be determined in parallel or specific ions can be better differentiated from unlabeled organic material, has considerable utility over existing methods.