In life science research it is often desirable to identify the constituent proteins in a sample. Typically, the sample is extracted from an organism or collection of living cells. Such samples, of which blood serum and cell lysates are representative, are generally composed of many thousands of proteins. In disease or pathway research it is often necessary to assess the protein composition of many such samples in order to correlate the presence, absence or amount of specific proteins to the state of the source organism.
Complex mixtures of proteins are typically separated by multiple mechanisms. Common examples of separation parameters are charge, hydrophobic interactions, affinity and molecular weight. After separation into constituent proteins, the identification of constituent proteins is often required. The most common and useful method of protein identification is peptide mass fingerprinting using mass spectrometry. FIG. 2 shows an exemplary prior art process flow diagram. This process uses one, two or more methods (61, 62) for separating constituent proteins in the sample mixture, breaking up the proteins in the sample 60 into peptides with proteolytic digestion 63, most commonly using the trypsin enzyme, and reading the mass spectra of the peptides on a mass spectrometer 64. Depending on the separation mechanisms used, digestion may be performed before either one of the two separations or just before the mass spectrometry measurement. Also, there may be more or fewer separation mechanisms than the two shown in the figure. The resulting mass spectra are compared with peptide spectra from theoretical digests of sequences of known proteins in a database 65 and a plurality of sample protein identifications 66 are produced by correlation of the measured peptide masses to the calculated sequence masses.
Several types of mass spectrometer instruments are used for peptide mass fingerprinting. One type is the Matrix Assisted Laser Desorption Ionization-Time Of Flight (MALDI-TOF). Peptide samples are introduced into MALDI instruments by spotting the liquid solution onto a MALDI target plate, the target plate having been previously coated with a matrix substance that facilitates the ionization of compounds to be measured. The MALDI plate with one or more samples spotted upon one or more of its target areas is then inserted into the spectrometer. A laser beam ionizes the sample spots and ejects the ions into the driving electric fields of the mass spectrometer. An example of a MALDI mass spectrometer is the PerkinElmer prOTOF 2000 orthogonal MALDI which uses 96-, 384- or 1,536-sample MALDI plates with the form factor of flat, thin microplates.
Another type of mass spectrometer instrument used for peptide mass fingerprinting is the electro-spray ionization mass spectrometer (ESI). Sample introduction of ESI instruments may be a continuous or near-continuous flow of liquid unlike the batch loading of discrete samples required by the MALDI. In this continuous flow case, measurements are taken serially at periodic time intervals against a continuous inflow of peptides to be characterized.
Separation of the protein mixture may be performed in a variety of separation matrices. A separation matrix is a support that has size, porosity and functionality characteristics in order to enable interaction with, and separation of, molecules. Typical supports for separation matrices include silica, alumina, agarose, acrylamide, styrene divinylbenzene, glass, dextran, polystyrene, acrylics, nylon, polyvinylidene difluoride, and combinations thereof. The separation matrix support can be in a form typically found for chromatography resins such as particles, gels, membranes or any other form that enables suitable separation characteristics. A flow-through vessel that holds a separation matrix is commonly called a column.
The functionality characteristics of the separation matrix support enable interactions with molecules. These functionalities can be cationic or anionic to allow for charge based interactions; alkyl chain, usually in the three to eighteen carbon length to allow hydrophobic interactions; or affinity ligands for specific binding interactions. The support may also have porosity characteristics that cause a molecular weight based separation as the molecules flow through it.
The most established analytical method of separating and identifying proteins is two-dimensional gel electrophoresis (2-D gel) followed by MALDI mass spectrometry. The major steps of this process are shown in the flow diagram of prior art FIG. 3. The complex sample mixture 67 is first separated by charge (pH) by an electrophoresis process called isoelectric focusing 68. This produces a linear strip of gel material with the proteins separated by charge along the length of the strip. The strip is placed in contact with the edge of a two-dimensional polyacrylamide gel sheet in the appropriate buffer and voltage is applied to separate the proteins according to molecular weight via gel electrophoresis 69. Individual proteins form spots of varying sizes and shapes across the gel. The proteins are labeled 70 either before or after separation, typically either through staining or fluorescent labeling. The labeled 2-D gel is imaged and the image is analyzed 71 to identify specific spots representing the location of specific proteins in the gel. The spots of proteins of interest are cut out of the gel 72 and digested to peptides 73, typically with trypsin. The peptide solution is then typically spotted onto a MALDI plate to facilitate peptide mass fingerprinting using a mass spectrometer 74 and a mass database 75 to produce protein identifications 76.
A common extension to the 2-D gel process is assessment of differential protein expression between two complex samples, samples from normal and diseased organisms for example. One typical process for differential separation on 2-D gels is to label all of the proteins in each sample with a different fluorescent dye (Patton et al. Current Opinion in Biotechnology 2001 6:63-69). The samples are then mixed, the 2-D gel separation is performed, and then the imaging is performed separately at the wavelengths of each of the two fluorescent dyes. In theory, proteins that exist in common in both samples will produce 2-D gel spots that are coincident. Proteins that exist in one sample but not in the other will produce spots at only one of the wavelengths. Further, proteins that exist in both samples but in different concentrations can be assessed by the ratio of their fluorescent intensities at the two wavelengths. Another typical process for differential measurements on 2-D gels is digital correlation of protein spots in images from two independent gels, and quantitating the differences in protein amount in each gel. This method suffers from its dependence on multiple 2-D gels producing protein spots in a reproducible manner.
Among the shortcomings of the 2-D gel process are the degree of skill required to perform the process, the large amount of manual manipulation of reagents and gels required, the lack of repeatability and reproducibility of results, and the length of time required for the process, which is often two or three days. Also, the assessment of differential protein expression using two dyes is limited by the dyes' ability to label all proteins to produce fluorescent signals proportional to their concentrations and by the fluorescent dyes' effects on the separation process, as well as limits to spot finding and quantitation at the image processing step. In an attempt to address these shortcomings another approach to the task called MUlti-Dimensional Protein Identification Technology (MUDpit) has been developed as depicted in prior art FIG. 4.
The MUDpit process utilizes liquid chromatography (LC) rather than gel electrophoresis as the separation modality. Referring to prior art FIG. 4, a protein sample 80 is first digested to peptides 81, then LC drives the peptide sample mixture through a flow-through column 84 containing a separation matrix while varying the concentration of the separation buffer, typically with a constant fluid flow rate and a linear concentration gradient with time. The buffer concentration gradient is typically produced by linearly varying the flow rates of two buffer solutions 82 and 83, one flow rate increasing while the other decreases, keeping the total flow rate through the LC column constant.
Unlike 2-D gels which produce separations as physical spots with specific locations on a 2-D plane, LC produces a series of volumes of eluted solutions (fractions) that are typically sampled at uniform time increments from a flowing output stream at the output port of a column 85. The LC process is inherently serial in nature; the fractions are delivered out of a single column one after the other. The MUDpit process further utilizes two complete LC processes in series to produce two dimensions of separation analogous to the 2-D gel process. The first separation is generally performed on an ion-exchange column and the second on a reverse-phase column. Time increment fractions are collected from the output stream of the first column 88, then each of those fractions is run independently on the second separation column 89 to generate a second series of time-increment fractions 90. Often the output of the second column is directed continuously to the input of a mass spectrometer, typically an electro-spray tandem mass spectrometer 91. In this arrangement the continuous flow from the second column is directed to the mass spectrometer instrument and the time-increment fractions are generated by the mass spectrometer's sampling of the stream. Other variations of MUDpit utilize multi-modality columns, capillaries and other variations of detailed configuration but retain the significant operational details described here.
The MUDpit process can be adapted to differential analysis between two samples by labeling the proteins or peptides with mass tags prior to separation (Patton et al., Current Opinion in Biotechnology 2002 13:321-328). Mass tags are molecules of known, small molecular weight that can be resolved by the mass spectrometer but do not materially affect the separation process. Mass spectra of identical peptides from two mass tag-labeled samples will have the same form but will be shifted along the mass axis by the difference of the mass of the tags, so their spectra can be differentiated. The ratio of the paired spectra's signal levels are representative of the relative concentrations of the protein in the two samples.
The use of mass tags for differential protein analysis has been described extensively in the literature. Mass tags can be isotopes of the constituent atoms of the proteins, such as N15, C14 or H2 or can be larger such as a CH3 group replacing a hydrogen atom. Labeling proteins with mass tags can be performed biologically in cell culture by using a culture media containing isotopic compounds as has been described by Oda et al., PNAS Jun. 8, 1999; 96(12):6591-6596 and Chen et al., Anal. Chem. Feb. 16, 2000; 72, 1134-1143, for example. Mass tags can also be applied directly to proteins by chemical labeling as described by Weckwerth et al. (Rapid Commun, in Mass Spectrom. 14, 1677-1681; 2000) and Kelleher et al. (Journal of Biological Chemistry, Vol. 72, Dec. 19 1997, 32215-32220).
An advantage of the MUDpit process over the 2-D gel process is the degree of automation that can be applied. The LC process is typically hands free. The output of the final LC column can be plumbed into an electro-spray mass spectrometer to deliver the samples to the measurement instrument automatically.
The MUDpit technique also has disadvantages. First, the proteins must be digested to peptides before any separation is performed. This limits the resolution and range of separations as it makes the peptide mixture for the first separation an extremely complex one with potentially millions of different peptides to be discriminated. Short peptides may even overlap between multiple proteins. Second, the dynamic range of the LC process on peptides is lower than that of 2-D gels on intact proteins, so the signals from peptides from high-abundance proteins are more likely to overwhelm signals from low-abundance proteins. These problems are more pronounced when using MUDpit for differential measurements on low-abundance proteins. Further, since the separation elements in MUDpit are inherently serial rather than parallel, the throughput of the process is limited, making the elapsed time to evaluate a sample long even though the process can be largely automated.
Thus, there exists a need for an automated method, system, apparatus and kit for separation and identification of proteins that are more reproducible than 2-D gels. Additionally, the ability to avoid protein digestion prior to the first separation process and allow separations to be done in parallel would also prove beneficial. Such a method and system that supports differential protein analysis when needed would also prove beneficial.