Genomic technology has advanced to a point at which, in principle, it has become possible to determine complete genomic sequences and to quantitatively measure the mRNA levels for each gene expressed in a cell. For some species the complete genomic sequence has now been determined, and for one strain of the yeast Saccharomyces cervisiae, the mRNA levels for each expressed gene have been precisely quantified under different growth conditions (Velculescu et al., 1997). Comparative cDNA array analysis and related technologies have been used to determine induced changes in gene expression at the mRNA level by concurrently monitoring the expression level of a large number of genes (in some cases all the genes) expressed by the investigated cell or tissue (Shalon et al., 1996). Furthermore, biological and computational techniques have been used to correlate specific function with gene sequences. The interpretation of the data obtained by these techniques in the context of the structure, control and mechanism of biological systems has been recognized as a considerable challenge. In particular, it has been extremely difficult to explain the mechanism of biological processes by genomic analysis alone.
Proteins are essential for the control and execution of virtually every biological process. The rate of synthesis and the half-life of proteins and thus their expression level are also controlled post-transcriptionally. Furthermore, the activity of proteins is frequently modulated by post-translational modifications, in particular protein phosphorylation, and dependent on the association of the protein with other molecules including DNA and proteins. Neither the level of expression nor the state of activity of proteins is therefore directly apparent from the gene sequence or even the expression level of the corresponding mRNA transcript. It is therefore essential that a complete description of a biological system include measurements that indicate the identity, quantity and the state of activity of the proteins which constitute the system. The large-scale (ultimately global) analysis of proteins expressed in a cell or tissue has been termed protein analysis (Pennington et al., 1997).
At present no protein analytical technology approaches the throughput and level of automation of genomic technology. The most common implementation of protein analysis is based on the separation of complex protein samples most commonly by two-dimensional gel electrophoresis (2DE) and the subsequent sequential identification of the separated protein species (Ducret et al., 1998; Garrels et al., 1997; Link et al., 1997; Shevchenko et al., 1996; Gygi et al. 1999; Boucherie et al., 1996). This approach has been revolutionized by the development of powerful mass spectrometric techniques and the development of computer algorithms which correlate protein and peptide mass spectral data with sequence databases and thus rapidly and conclusively identify proteins (Eng et al., 1994; Mann and Wilm, 1994; Yates et al., 1995). This technology has reached a level of sensitivity which now permits the identification of essentially any protein which is detectable by conventional protein staining methods including silver staining (Figeys and Aebersold, 1998; Figeys et al., 1996; Figeys et al., 1997; Shevchenko et al., 1996). However, the sequential manner in which samples are processed limits the sample throughput, the most sensitive methods have been difficult to automate and low abundance proteins, such as regulatory proteins, escape detection without prior enrichment, thus effectively limiting the dynamic range of the technique. In the 2DE/(MS)n method, proteins are quantified by densitometry of stained spots in the 2DE gels.
The development of methods and instrumentation for automated, data-dependent electrospray ionization (ESI) tandem mass spectrometry (MSn) in conjunction with microcapillary liquid chromatography (xcexcLC) and database searching has significantly increased the sensitivity and speed of the identification of gel-separated proteins. As an alternative to the 2DE/MSn approach to protein analysis, the direct analysis by tandem mass spectrometry of peptide mixtures generated by the digestion of complex protein mixtures has been proposed (Dongr""e et al., 1997). xcexcLC-MS/MS has also been used successfully for the large-scale identification of individual proteins directly from mixtures without gel electrophoretic separation (Link et al., 1999; Opitek et al., 1997). While these approaches dramatically accelerate protein identification, the quantities of the analyzed proteins cannot be easily determined, and these methods have not been shown to substantially alleviate the dynamic range problem also encountered by the 2DE/MS/MS approach. Therefore, low abundance proteins in complex samples are also difficult to analyze by the uLC/MS/MS method without their prior enrichment.
It is therefore apparent that current technologies, while suitable to identify the components of protein mixtures, are neither capable of measuring the quantity nor the state of activity of the protein in a mixture. Even evolutionary improvements of the current approaches are unlikely to advance their performance sufficiently to make routine quantitative and functional proteome analysis a reality.
This invention provides methods and reagents that can be employed in proteome analysis which overcome the limitations inherent in traditional techniques. The basic approach described can be employed for the quantitative analysis of protein expression in complex samples (such as cells, tissues, and fractions thereof), the detection and quantitation of specific proteins in complex samples, and the quantitative measurement of specific enzymatic activities in complex samples.
In this regard, a multitude of analytical techniques are presently available for clinical and diagnostic assays which detect the presence, absence, deficiency or excess of a protein or protein function associable with a normal or disease state. While these techniques are quite sensitive, they do not necessarily provide chemical speciation of products and may, as a result, be difficult to use for assaying several proteins or enzymes simultaneously in a single sample. Current methods may not distinguish among aberrant expression of different enzymes or their malfunctions which lead to a common set of clinical symptoms. The methods and reagents herein can be employed in clinical and diagnostic assays for simultaneous (multiplex) monitoring of multiple proteins and protein reactions.
This invention provides analytical reagents and mass spectrometry-based methods using these reagents for the rapid, and quantitative analysis of proteins or protein function in mixtures of proteins. The analytical method can be used for qualitative and particularly for quantitative analysis of global protein expression profiles in cells and tissues, i.e. the quantitative analysis of proteomes. The method can also be employed to screen for and identify proteins whose expression level in cells, tissue or biological fluids is affected by a stimulus (e.g., administration of a drug or contact with a potentially toxic material), by a change in environment (e.g., nutrient level, temperature, passage of time) or by a change in condition or cell state (e.g., disease state, malignancy, site-directed mutation, gene knockouts) of the cell, tissue or organism from which the sample originated. The proteins identified in such a screen can function as markers for the changed state. For example, comparisons of protein expression profiles of normal and malignant cells can result in the identification of proteins whose presence or absence is characteristic and diagnostic of the malignancy.
In an exemplary embodiment, the methods herein can be employed to screen for changes in the expression or state of enzymatic activity of specific proteins. These changes may be induced by a variety of chemicals, including pharmaceutical agonists or antagonists, or potentially harmful or toxic materials. The knowledge of such changes may be useful for diagnosing enzyme-based diseases and for investigating complex regulatory networks in cells.
The methods herein can also be used to implement a variety of clinical and diagnostic analyses to detect the presence, absence, deficiency or excess of a given protein or protein function in a biological fluid (e.g., blood), or in cells or tissue. The method is particularly useful in the analysis of complex mixtures of proteins, i.e., those containing 5 or more distinct proteins or protein functions.
The inventive method employs affinity-labeled protein reactive reagents that allow for the selective isolation of peptide fragments or the products of reaction with a given protein (e.g., products of enzymatic reaction) from complex mixtures. The isolated peptide fragments or reaction products are characteristic of the presence of a protein or the presence of a protein function, e.g., an enzymatic activity, respectively, in those mixtures. Isolated peptides or reaction products are characterized by mass spectrometric (MS) techniques. In particular, the sequence of isolated peptides can be determined using tandem MS (MSn) techniques, and by application of sequence database searching techniques, the protein from which the sequenced peptide originated can be identified. The reagents also provide for differential isotopic labeling of the isolated peptides or reaction products which facilitates quantitative determination by mass spectrometry of the relative amounts of proteins in different samples. Also, the use of differentially isotopically-labeled reagents as internal standards facilitates quantitative determination of the absolute amounts of one or more proteins or reaction products present in the sample.
In general, the affinity labeled protein reactive reagents of this invention have three portions: an affinity label (A) covalently linked to a protein reactive group (PRG) through a linker group (L):
Axe2x80x94Lxe2x80x94PRG
The linker may be differentially isotopically labeled, e.g., by substitution of one or more atoms in the linker with a stable isotope thereof. For example, hydrogens can be substituted with deuteriums or C12 with C13.
The affinity label A functions as a molecular handle that selectively binds covalently or non-covalently, to a capture reagent (CR). Binding to CR facilitates isolation of peptides, substrates or reaction products tagged or labeled with A. In specific embodiments, A is a strepavidin or avidin. After affinity isolation of affinity tagged materials, some of which may be isotopically labeled, the interaction between A and the capture reagent is disrupted or broken to allow MS analysis of the isolated materials. The affinity label may be displaced from the capture reagent by addition of displacing ligand, which may be free A or a derivative of A, or by changing solvent (e.g., solvent type or pH) or temperature conditions or the linker may be cleaved chemically, enzymatically, thermally or photochemically to release the isolated materials for MS analysis.
Two types of PRG groups are specifically provided herein: (a) those groups that selectively react with a protein functional group to form a covalent or non-covalent bond tagging the protein at specific sites, and (b) those that are transformed by action of the protein, e.g., that are substrates for an enzyme. In specific embodiments, PRG is a group having specific reactivity for certain protein groups, such as specificity for sulfhydryl groups, and is useful in general for selectively tagging proteins in complex mixtures. A sulfhydryl specific reagent tags proteins containing cysteine. In other specific embodiments, PRG is an enzyme substrate that is selectively cleaved (leaving Axe2x80x94L) or modified (giving Axe2x80x94Lxe2x80x94PRGxe2x80x2) by the action of an enzyme of interest.
Exemplary reagents have the general formula:
Axe2x80x94B1xe2x80x94X1xe2x80x94(CH2)nxe2x80x94[X2xe2x80x94(CH2)m]xxe2x80x94X3xe2x80x94(CH2)pxe2x80x94X4xe2x80x94B2xe2x80x94PRG
where:
A is the affinity label;
PRG is the protein reactive group;
X1, X2, X3 and X4, independently of one another, and X2 independently of other X2 in the linker group, can be selected from O, S, NH, NR, NRRxe2x80x2+, CO, COO, COS, Sxe2x80x94S, SO, SO2, COxe2x80x94NRxe2x80x2, CSxe2x80x94NRxe2x80x2, Sixe2x80x94O, aryl or diaryl groups or X1-X4 may be absent, but preferably at least one of X1-X4 is present;
B1 and B2, independently of one another, are optional moieties that can faciliate bonding of the A or PRG group to the linker or prevent undesired cleavage of those groups from the linker and can be selected, for example, from COO, CO, COxe2x80x94NRxe2x80x2, CSxe2x80x94NRxe2x80x2 and may contain one or more CH2 groups alone or in combination with other groups, e.g. (CH2)qxe2x80x94CONRxe2x80x2, (CH2)qxe2x80x94CSxe2x80x94NRxe2x80x2, or (CH2)q;
n, m, p and q are whole numbers that can have values from 0 to about 100, preferably one of n, m, p or q is not 0 and x is also a whole number that can range from 0 to about 100 where the sum of n+xm+p+q is preferably less than about 100 and more preferably less than about 20;
R is an alkyl, alkenyl, alkynyl, alkoxy or aryl group; and
Rxe2x80x2 is a hydrogen, an alkyl, alkenyl, alkynyl, alkoxy or aryl group.
One or more of the CH2 groups of the linker can be optionally substituted with small (C1-C6) alkyl, alkenyl, or alkoxy groups, an aryl group or can be substituted with functional groups that promote ionization, such as acidic or basic groups or groups carrying permanent positive or negative charge. One or more single bonds connecting CH2 groups in the linker can be replaced with a double or a triple bond. Preferred R and Rxe2x80x2 alkyl, alkenyl, alkynyl or alkoxy groups are small having 1 to about 6 carbon atoms.
One or more of the atoms in the linker can be substituted with a stable isotope to generate one or more substantially chemically identical, but isotopically distinguishable reagents. For example, one or more hydrogens in the linker can be substituted with deuterium to generate isotopically heavy reagents.
In an exemplary embodiment the linker contains groups that can be cleaved to remove the affinity tag. If a cleavable linker group is employed, it is typically cleaved after affinity tagged peptides, substrates or reaction products have been isolated using the affinity label together with the CR. In this case, any isotopic labeling in the linker preferably remains bound to the protein, peptide, substrate or reaction product.
Linker groups include among others: ethers, polyethers, ether diamines, polyether diamines, diamines, amides, polyamides, polythioethers, disulfides, silyl ethers, alkyl or alkenyl chains (straight chain or branched and portions of which may be cyclic), aryl, diaryl or alkyl-aryl groups. Aryl groups in linkers can contain one or more heteroatoms (e.g., N, O or S atoms).
In one aspect, the invention provides a mass spectrometric method for identification and quantitation of one or more proteins in a complex mixture which employs affinity labeled reagents in which the PRG is a group that selectively reacts with certain groups that are typically found in peptides (e.g.,sulfhydryl, amino, carboxy, homoserine lactone groups). One or more affinity labeled reagents with different PRG groups are introduced into a mixture containing proteins and the reagents react with certain proteins to tag them with the affinity label. It may be necessary to pretreat the protein mixture to reduce disulfide bonds or otherwise facilitate affinity labeling. After reaction with the affinity labeled reagents, proteins in the complex mixture are cleaved, e.g., enzymatically, into a number of peptides. This digestion step may not be necessary, if the proteins are relatively small. Peptides that remain tagged with the affinity label are isolated by an affinity isolation method, e.g., affinity chromatography, via their selective binding to the CR. Isolated peptides are released from the CR by displacement of A or cleavage of the linker, and released materials are analyzed by liquid chromatography/mass spectrometry (LC/MS). The sequence of one or more tagged peptides is then determined by MSn techniques. At least one peptide sequence derived from a protein will be characteristic of that protein and be indicative of its presence in the mixture. Thus, the sequences of the peptides typically provide sufficient information to identify one or more proteins present in a mixture.
Quantitative relative amounts of proteins in one or more different samples containing protein mixtures (e.g., biological fluids, cell or tissue lysates, etc.) can be determined using chemically identical, affinity tagged and differentially isotopically labeled reagents to affinity tag and differentially isotopically label proteins in the different samples. In this method, each sample to be compared is treated with a different isotopically labeled reagent to tag certain proteins therein with the affinity label. The treated samples are then combined, preferably in equal amounts, and the proteins in the combined sample are enzymatically digested, if necessary, to generate peptides. Some of the peptides are affinity tagged and in addition tagged peptides originating from different samples are differentially isotopically labeled. As described above, affinity labeled peptides are isolated, released from the capture reagent and analyzed by (LC/MS). Peptides characteristic of their protein origin are sequenced using MSn techniques allowing identification of proteins in the samples. The relative amounts of a given protein in each sample is determined by comparing relative abundance of the ions generated from any differentially labeled peptides originating from that protein. The method can be used to assess relative amounts of known proteins in different samples. Further, since the method does not require any prior knowledge of the type of proteins that may be present in the samples, it can be used to identify proteins which are present at different levels in the samples examined. More specifically, the method can be applied to screen for and identify proteins which exhibit differential expression in cells, tissue or biological fluids. It is also possible to determine the absolute amounts of specific proteins in a complex mixture. In this case, a known amount of internal standard, one for each specific protein in the mixture to be quantified, is added to the sample to be analyzed. The internal standard is an affinity tagged peptide that is identical in chemical structure to the affinity tagged peptide to be quantified except that the internal standard is differentially isotopically labeled, either in the peptide or in the affinity tag portion, to distinguish it from the affinity tagged peptide to be quantified. The internal standard can be provided in the sample to be analyzed in other ways. For example, a specific protein or set of proteins can be chemically tagged with an isotopically-labeled affinity tagging reagent. A known amount of this material can be added to the sample to be analyzed. Alternatively, a specific protein or set of proteins may be labeled with heavy atom isotopes and then derivatized with an affinity tagging reagent.
Also, it is possible to quantify the levels of specific proteins in multiple samples in a single analysis (multiplexing). In this case, affinity tagging reagents used to derivatize proteins present in different affinity tagged peptides from different samples can be selectively quantified by mass spectrometry.
In this aspect of the invention, the method provides for quantitative measurement of specific proteins in biological fluids, cells or tissues and can be applied to determine global protein expression profiles in different cells and tissues. The same general strategy can be broadened to achieve the proteome-wide, qualitative and quantitative analysis of the state of modification of proteins, by employing affinity reagents with differing specificity for reaction with proteins. The method and reagents of this invention can be used to identify low abundance proteins in complex mixtures and can be used to selectively analyze specific groups or classes of proteins such as membrane or cell surface proteins, or proteins contained within organelles, sub-cellular fractions, or biochemical fractions such as immunoprecipitates. Further, these methods can be applied to analyze differences in expressed proteins in different cell states. For example, the methods and reagents herein can be employed in diagnostic assays for the detection of the presence or the absence of one or more proteins indicative of a disease state, such as cancer.
In a second aspect, the invention provides a MS method for detection of the presence or absence of a protein function, e.g., an enzyme activity, in a sample. The method can also be employed to detect a deficiency or excess (over normal levels) of protein function in a sample. Samples that can be analyzed include various biological fluids and materials, including tissue and cells. In this case, the PRG of the affinity labeled reagent is a substrate for the enzyme of interest. Affinity labeled substrates are provided for each enzyme of interest and are introduced into a sample where they react to generate affinity labeled products, if the enzyme of interest is present in the sample. Products or unreacted substrate that are tagged with the affinity label are isolated by an affinity isolation method, e.g., affinity chromatography, via their selective binding to the CR. The isolated tagged substrates and products are analyzed by mass spectrometry. Affinity labeled products include those in which the substrate is entirely cleaved from the linker or in which the substrate is modified by reaction with a protein of interest. Detection of the affinity-labeled product indicates the protein function is present in the sample. Detection of little or no affinity labeled product indicates deficiency or absence, respectively, of the protein function in the sample.
The amount of selected protein, e.g., measured in terms of enzyme activity, present in a sample can be measured by introducing a known amount of an internal standard which is an isotopically labeled analog of the expected product of the enzymatic reaction of the reagent substrate. The internal standard is substantially chemically identical to the expected enzymatic reaction product, but is isotopically distinguishable therefrom. The level of protein function (e.g., enzymatic activity) in a given sample can be compared with activity levels in other samples or controls (either negative or positive controls). The procedure therefore can detect the presence, absence, deficiency or excess of a protein function in a sample. The method is capable of quantifying the velocity of an enzymatic reaction since it enables the amount of product formed over a known time period to be measured. This method can be multiplexed, by simultaneous use of a plurality of affinity labeled substrates selective for different protein functions and if quantitation is desired by inclusion of the corresponding internal standards for expected products, to analyze for a plurality of protein functions in a single sample.