The present invention relates generally to spectroscopic analysis of chemical and biological mixtures. More particularly, it relates to a method for relative quantification of proteins or other components in mixtures analyzed by mass spectrometry without using an internal standard, isotope label, or other chemical calibrant.
With the completion of the sequencing of the human genome, it has become apparent that genetic information is incapable of providing a comprehensive characterization of the biochemical and cellular functioning of complex biological systems. As a result, the focus of much molecular biological research is shifting toward proteomics and metabolomics, the systematic analysis of proteins and small molecules (metabolites) in a cell, tissue, or organism. Because proteins and metabolites are far more numerous, diverse, and fragile than genes, new tools must be developed for their discovery, identification, and quantification.
One important aspect of proteomics is the identification of proteins with altered expression levels. Differences in protein and metabolite levels over time or among populations can be associated with diseased states, drug treatments, or changes in metabolism. Identified molecular species may serve as biological markers for the disease or condition in question, allowing for new methods of diagnosis and treatment to be developed. In order to discover such biological markers, it is helpful to obtain accurate measurements of relative differences in protein and metabolite levels between different sample types, a process referred to as differential phenotyping.
Conventional methods of protein analysis combine two-dimensional (2D) gel electrophoresis, for separation and quantification, with mass spectrometric identification of proteins. Typically, separation is by isoelectric focusing followed by SDS-PAGE, which separates proteins by molecular weight. After staining and separation, the mixture appears as a two-dimensional array of spots of separated proteins. Spots are excised from the gel, enzymatically digested, and subjected to mass spectrometry for identification. Quantification of the identified proteins can be performed by observing the relative intensities of the spots via image analysis of the stained gel. Alternatively, peptides can be labeled isotopically before gel separation and expression levels quantified by mass spectrometry or radiographic methods.
While 2D gels combined with mass spectrometry (MS) has been the predominant tool of proteomics research, 2D gels have a number of key drawbacks that have led to the development of alternative methods. Most importantly, they cannot be used to identify certain classes of proteins. In particular, very acidic or basic proteins, very large or small proteins, and membrane proteins are either excluded or underrepresented in 2D gel patterns. Low abundance proteins, including regulatory proteins, are rarely detected when entire cell lysates are analyzed, reflecting a limited dynamic range. These deficiencies are detrimental for quantitative proteomics, which aims to detect any protein whose expression level changes.
In applications that do not require large-scale protein analysis, protein quantification can be performed by fluorescent, chemiluminescent, or other labeling of target proteins. Labeled antibodies are combined with a sample containing the desired protein, and the resulting protein-antibody complexes are counted using the appropriate technique. Such approaches are suitable only for known proteins with available antibodies, a fraction of the total number of proteins, and are not typically used for high-throughput applications. In addition, unlike mass spectrometric analysis, antibody-protein interactions are not fully molecularly specific and can yield inaccurate counts that include similarly structured and post-translationally modified proteins.
Because it can provide detailed structural information, mass spectrometry is currently believed to be a valuable analytical tool for biochemical mixture analysis and protein identification. For example, capillary liquid chromatography combined with electrospray ionization tandem mass spectrometry has been used for large-scale protein identification without gel electrophoresis. Qualitative differences between spectra can be identified, and proteins corresponding to peaks occurring in only some of the spectra serve as candidate biological markers. These studies are not quantitative, however. In most cases, quantification in mass spectrometry requires an internal standard, a compound introduced into a sample at known concentration. Spectral peaks corresponding to sample components are compared with the internal standard peak height or area for quantification. Ideal internal standards have elution and ionization characteristics similar to those of the target compound but generate ions with different mass-to-charge ratios. For example, a common internal standard is a stable isotopically-labeled version of the target compound.
Using internal standards for complex biological mixtures is problematic. In many cases, the compounds of interest are unknown a priori, preventing appropriate internal standards from being devised. The problem is more difficult when there are many compounds of interest. In addition, biological samples are often available in very low volumes, and addition of an internal standard can dilute mixture components significantly. Low-abundance components, often the most relevant or significant ones, may be diluted to below noise levels and hence undetectable. Also, it can be difficult to judge the proper amount of internal standard to use. Thus internal standards are not widespread solutions to the problem of protein quantification.
Recently, Gygi et al. introduced a method for quantitative differential protein profiling based on isotope-coded affinity tags (ICAT(trademark)) [S. P. Gygi et al., xe2x80x9cQuantitative analysis of complex protein mixtures using isotope-coded affinity tags,xe2x80x9d Nat. Biotechnol. 1999, 17: 994-999]. In this method, two samples containing (presumably) the same proteins at different concentrations are compared by incorporating a tag with a different isotope into each sample. In particular, cysteines are alkylated with either a heavy (deuterated) or light (undeuterated) reagent. The two samples, each containing a different isotope tag, are combined and proteolytically digested, and the combined mixture is subjected to mass spectrometric analysis. The ratio of intensities of the lower and upper mass components for identical peptides provides an accurate measure of the relative abundance of the proteins in the original samples. The initial study reported mean differences between observed and expected ratios of proteins in the two samples of between 2 and 12%.
The ICAT(trademark) technique has proven useful for many applications but has a number of drawbacks. First, the isotope tag is a relatively high-molecular-weight addition to the sample peptides, possibly complicating database searches for structural identification. The added chemical reaction and purification steps lead to sample loss and sometimes degraded tandem mass spectral fragmentation spectra. Additionally, proteins that do not contain cysteine cannot be tagged and identified. In order to obtain accurate relative quantification using ICAT, different samples must be processed identically and then combined prior to mass spectrometric analysis, and it is therefore impractical to compare samples acquired and processed at different times, or to compare unique samples. Furthermore, the method is not applicable to other molecular classes such as metabolites.
Existing protein and metabolite quantification techniques, therefore, require some type of chemical calibrant, increasing the sample handling steps and limiting the nature and number of samples to be compared. It would be beneficial to provide a method for quantification of proteins and low molecular weight components of chemical and biological mixtures that did not require an internal standard or other chemical calibrant.
Various embodiments of the present invention provide methods for estimation of relative concentrations of chemical sample components by mass spectrometry without the use of an internal standard.
In one embodiment, the present invention provides a method for processing spectral data containing peaks having peak intensities. A set of spectra is obtained from a plurality of chemical samples such as biological samples containing metabolites, proteins or peptides. The spectra can be mass spectra obtained by, for example, electrospray ionization (ESI), matrix-assisted laser desorption ionization (MALDI), or electron-impact ionization (EI). Peak intensities in each spectrum are scaled by a normalization factor to yield peak intensities that are proportional to the concentration of the responsible component. Based on scaled peak intensities, relative concentrations of a particular sample component can be estimated. The normalization factor is computed in dependence on chemical sample components whose concentrations are substantially constant in the chemical samples. In one embodiment, these components are not predetermined and are inherent components of the chemical samples. In another embodiment, the normalization factor is computed from ratios of peak intensities between two (e.g., first and second) spectra of the set and is a non-parametric measure of peak intensities such as a median.
In an alternative embodiment, the present invention provides a method for estimating relative concentrations of a particular component in at least two chemical samples, such as biological samples containing proteins or peptides. Mass spectra are acquired, e.g., by electrospray ionization, matrix-assisted laser desorption ionization, or electron-impact ionization of the samples, and peak intensities of peaks in the spectra are scaled by a normalization factor. The normalization factor is computed in dependence on chemical sample components whose concentrations are substantially constant in the chemical samples. In one embodiment, it is computed from ratios of peak intensities in two (e.g., first and second) of the spectra and is a non-parametric measure (e.g., median) of peak intensities. Based on scaled peak intensities of a peak corresponding to the particular component, relative concentrations of the particular component can be estimated.
Additionally, the present invention provides a method for detecting a component present in substantially different concentrations in at least two chemical samples, such as biological samples containing proteins or peptides. Mass spectra of the samples are obtained, e.g., using electrospray ionization, matrix-assisted laser desorption ionization, or electron-impact ionization. Peak intensities in each spectrum are scaled by a normalization factor computed in dependence on chemical sample components whose concentrations are substantially constant in the chemical samples. In one embodiment, the normalization factor is computed from ratios of peak intensities in two (e.g., first and second) of the spectra and is a non-parametric measure (e.g., median) of peak intensities. A peak is then identified that has substantially different scaled peak intensities in at least two of the mass spectra. In an additional embodiment, the component corresponding to the peak is identified. A relative concentration of the component in the samples can be computed based on the scaled peak intensities of the corresponding peak.
Another embodiment of the present invention is a program storage device accessible by a processor and tangibly embodying a program of instructions executable by the processor to perform method steps for the above-described methods. An additional embodiment is a computer readable medium storing a plurality of normalized peak intensities obtained by any of the methods described above.