The identification of proteins in biological samples is an essential activity of biochemical analysis, particularly the determination of the sequence of a protein, since the sequence determines the structure of a protein, which, in turn, determines the function of the protein. Traditional techniques for protein identification are cumbersome and relatively slow. The mainstay of protein identification techniques has been chemical sequencing of peptides using the Edman degradation, which can sequentially identify amino acids in a peptide from the N-terminus. This sequencing technique is typically used in conjunction with enzymatic digestion of a protein or polypeptide. Typically, an unidentified polypeptide is digested and its component peptides are separated from each other by chromatography. The individual peptides are then subjected to Edman degradation. The sequences of the peptides can be ordered by comparing the sequences of peptides from digestion of the polypeptide with different sequence specific cleavage reagents. This process allows the complete sequence of a polypeptide to be determined. While this has been a highly successful technique for the identification of proteins, it is quite laborious.
New technologies have made rapid protein identification more feasible such as Matrix Assisted Laser Desorption Ionisation mass spectrometry.
This technique has permitted the development of peptide mass fingerprinting as a relatively rapid procedure for protein identification.
A typical peptide mass fingerprinting protocol involves determining the mass of the unidentified protein followed by digestion of the protein with trypsin. Trypsin cleaves polypeptides selectively at arginine and lysine residues, leaving either arginine or lysine at the C-termini of the product peptides. The positions of lysine and arginine in the sequence of a polypeptide determine where the polypeptide is cut giving rise to a characteristic series of peptides. The pattern of peptides can be easily detected by MALDI-TOF mass spectrometry. This mass spectrometric technique has a large mass range, can readily ionise large biomolecules, will preferentially produce singly charged ions and competition for ionisation with this technique is not severe, although competition can be problematic. This means that there is generally one peak in the mass spectrum for each peptide, the mass-to-charge ratio for each peak has essentially the same value as the mass of the peptide, with an added proton to ionise the peptide, and most (and sometimes all) the peptides from the digest of an unidentified protein can be analysed simultaneously. In effect the mass spectrum is a ‘bar-code’ in which the lines in the spectrum represent the masses of the characteristic cleavage peptides of the protein. For any given protein, there may be some peptides, which have the same mass as a peptide from another protein but it is very unlikely that two different proteins will give rise to peptides that all have identical masses. This means that the pattern of masses of the digest of a protein is a fairly unique identifier of that protein and is called a Peptide Mass Fingerprint (PMF). The relative uniqueness of PMFs means that databases of predicted determined from known protein sequences or sequences that have been predicted from DNA or expressed sequence tags (ESTs), can be used to identify proteins in biological samples (Pappin D J C, P and Bleasby A J, Current Biology 3: 327–332, “Rapid identification of proteins by peptide-mass fingerprinting.” 1993; Mann M, P, Roepstorff P. Biol Mass Spectrom 22 (6): 338–345, “Use of mass spectrometric molecular weight information to identify proteins in sequence databases.” 1993; Yates J R 3rd, Speicher S, Griffin P R, Hunkapiller T, Anal Biochem 214 (2): mass maps: a highly informative approach to protein identification.” 1993). The PMF for an unknown protein can be compared with all of the PMFs in a database to find the best match, thereby identifying the protein. Searches of this kind can be constrained by determining the mass of the protein prior to digestion. In this way the pattern of masses of an unidentified polypeptide can be related to its sequence, which in turn can help to determine the role of a protein in a particular sample.
There are, however, many technical difficulties involved in determining the PMF for a protein. A typical protein will give rise to twenty to thirty peptides after cleavage with trypsin, but not all of these peptides will appear in the mass spectrum. The precise reasons for this are not fully understood. One factor that is believed to cause incomplete spectra is competition for protonation during the ionisation process, resulting in preferential ionisation of arginine containing peptides (Krause E. & Wenschuh H. & Jungblut P. R., Anal Chem. 71 (19): 4160–4165, “The dominance of arginine-containing peptides in MALDI-derived mass fingerprints of proteins.” 1999). In addition, there are surface effects that result from the process of preparing MALDI targets. The targets are prepared by dissolving the peptide digest in a saturated solution of the matrix material. Small droplets of the peptide/matrix solution are dropped onto a metal target and left to dry. Differences in solubility of peptides will mean that some peptides will preferentially crystallise near the top surface of the matrix where they will be desorbed more readily.
Sensitivity is also a problem with conventional protocols for identifying proteins from their PMF. To be an effective tool, it should be possible to determine a PMF for as small a sample of protein as possible to improve the dynamic range of the analysis of protein samples.
Some attempts have been made to improve the ionisation of peptides that do not contain arginine. Conversion of lysine to homo-arginine is one approach that has met with some success (V. Bonetto et Journal of Protein Chemistry 16 (5): Sequence Determination of Modified Peptides by MALDI MS”, 1997; Brancia et al., Electrophoresis 22: 552–559, “A combination of chemical derivitisation and improved tools optimises protein identification for 2001). The conversion of lysine to homo-arginine introduces guanidino functionalities into all of the peptides from a tryptic digest, with the exception of C-terminal peptides, greatly improving the representation of lysine containing peptides in the MALDI-TOF mass spectra.
Conventional techniques for determining the expression of proteins in biological samples depend on protein identification. The goal of protein expression profiling is to identify as many proteins in a sample as possible and, preferably, to determine the quantity of the protein in the sample. A typical method of profiling a population of proteins is by two-dimensional electrophoresis (R. A. Van Bogelen., E. R. Olson, “Application of two-dimensional protein gels in biotechnology.”, Biotechnol Annu Rev, 1: 69–103, 1995).
In this method a protein sample extracted from a biological sample is separated by two independent electrophoretic procedures. This first separation usually separates proteins on the basis of their iso-electric point using a gel-filled capillary or gel strip along which a pH gradient exists. Proteins migrate along the gradient until the pH is such that the protein has no net charge, referred to as the iso-electric point, from which the protein can migrate no further. After all of the proteins in the sample have reached their iso-electric point, the proteins are separated further using a second electrophoretic procedure. To perform the second procedure, the entire iso-electric focussing gel strip is then laid against one edge of a rectangular gel. The separated proteins in the strip are then separated in the second gel on the basis of their size. The proteins are thus resolved into a 2-dimensional array of spots in a rectangular slab of acrylamide.
However, after separating the proteins in a sample from each other, there remains the problem of detecting and then identifying the proteins. The currently favoured approach to identify proteins is to analyse the protein in specific spots on the gel by peptide mass fingerprinting using MALDI-TOF mass spectrometry (Jungblut P, Thiede B. “Protein identification from 2-DE gels MALDI mass Spectrom Rev. 16: 145–162, 1997). 2-DE technology is therefore limited by the detection capabilities of the peptide mass fingerprinting methods used in the identification of proteins in gel spots.
The existing technology cannot easily compare the expression levels of two or more samples and there are sensitivity problems with such a complex process due to sample losses during the separation of the proteins and their subsequent recovery from the 2-D gel. In addition, proteins extracted from a 2-D gel are generally in buffers containing solutes that are incompatible with mass spectrometric analysis.
It is an aim of this invention to solve the problems associated with the known methods described above. It is thus an aim of this invention to provide improved methods for producing peptide mass fingerprints, using labels (tags). It is a further aim of this invention to provide methods to determine peptide mass fingerprints using protein reactive reagents that are stable in water, selective for lysine and that work under mild reaction conditions without degradation of the reagents.