This invention relates to rapid and efficient methods for sequencing formed or forming polypeptides utilizing a mass spectrometer.
Polypeptides are a class of compounds composed of xcex1-amino acid residues chemically bonded together by amide linkages with elimination of water between the carboxy group of one amino acid and the amino group of another amino acid. A polypeptide is thus a polymer of xcex1-amino acid residues which may contain a large number of such residues. Peptides are similar to polypeptides, except that they are comprised of a lesser number of xcex1-amino acids. There is no clear-cut distinction between polypeptides and peptides. For convenience, in this disclosure and claims, the term xe2x80x9cpolypeptidexe2x80x9d will be used to refer generally to peptides and polypeptides.
Proteins are polypeptide chains folded into a defined three dimensional structure. They are complex high polymers containing carbon, hydrogen, nitrogen, and sulfur and are comprised of linear chains of amino acids connected by peptide links. They are similar to polypeptides, but of a much higher molecular weight.
For a complete understanding of physiological reactions involving proteins it is often necessary to understand their structure. There are a number of facets to the structure of proteins. These are the primary structure which is concerned with amino acid sequence in the protein chain and the secondary, tertiary and quaternary structures which generally relate to the three dimensional configuration of proteins. This invention is concerned with sequencing polypeptides to assist in determining the primary structure of proteins. It provides a facile and accurate procedure for sequencing polypeptides. It is also applicable to sequencing the amino acid residues at the termini of proteins.
Many procedures have been used over the years to determine the amino acid sequence, i.e. the primary structure, of polypeptides and proteins. At the present time, the best method available for such determinations is the Edman degradation. In this procedure, one amino terminal amino acid residue at a time is removed from a polypeptide to be analyzed. That amino acid is normally identified by reverse phase high performance liquid chromatography (HPLC), but recently mass spectrometric procedures have been described for this purpose (1). The Edman degradation cycle is repeated for each successive terminal amino acid residue until the complete polypeptide has been degraded. The procedure is tedious and time consuming. Each sequential removal of a terminal amino acid requires 20 to 30 minutes. Hence, with a polypeptide of even moderate length, say for example 50 amino acid residues, a sequence determination may require many hours. The procedure has been automated. The automated machines are available as sequenators, but it still requires an unacceptable amount of time to carry out a sequence analysis. Although the procedure is widely employed, one which required less time and which yielded information about a broader range of modified or unusual amino acid residues present in a polypeptide would be very useful to the art. A process which can be used to sequence individual members of mixtures of polypeptides would be particularly useful.
Recent advances in the art of mass spectroscopy have made it possible to obtain characterizing data from extremely small amounts of polypeptide samples. It is, for example, presently possible because of the sensitivity and precision of available instruments to obtain useful data utilizing from picomole to subpicomole amounts of products to be analyzed. Further, the incipient ion-trap technologies promise even better sensitivities, and have already been demonstrated to yield useful spectra in the 10xe2x88x9215 to 10xe2x88x9216 sample range.
In general, both electrospray and matrix-assisted laser desorption ionizaton methods mainly generate intact molecular ions. The resolution of the electrospray quadrupole instruments is about 1 in 2,000 and that of the laser desorption time-of-flight instruments about 1 in 400. Both techniques give mass accuracies of about 1 in 10-20,000 (i.e. +/xe2x88x920.01% or better). There are proposed modifications of time-of-flight analyzer that may improve the resolution by up to factor of 10-fold, and markedly improve the sensitivity of that technique.
These techniques yield mass measurements accurate to +/xe2x88x920.2 atomic mass units, or better. These capabilities mean that, by employing the process of this invention, the polypeptide itself whether already formed or as it is being formed can be sequenced more readily, with greater speed, sensitivity, and precision, than the amino acid derivative released by stepwise degradation techniques such as the Edman degradation. As will be explained in more detail below, the process of this invention employs a novel technique of sequence determination in which a mixture containing a family of xe2x80x9cfragmentsxe2x80x9d, each differing by a single amino acid residue is produced and thereafter analyzed by mass spectroscopy.
This invention provides a method for the sequential analysis of polypeptides which may be already formed or are being formed by producing under controlled conditions, from the formed polypeptide or from the segments of the polypeptide as it is being formed, a mixture containing a series of adjacent polypeptides in which each member of the series differs from the next adjacent member by one amino acid residue. The mixture is then subjected to mass spectrometric analysis to generate a spectrum in which the peaks represent the separate members of the series. The differences in molecular mass between such adjacent members coupled with the position of the peaks in the spectrum for such adjacent members is indicative of the identity of the said amino acid residue and of its position in the chain of the formed or forming polypeptide.
The process of this invention which utilizes controlled cycling of reaction conditions to produce peptide ladders of predictable structure is to be contrasted with previous methods employing mass spectroscopy including exopeptidase digestion on uncontrolled chemical degradation. See references 2-5. Because of the uncontrolled nature of these previous methods, only incomplete sequence information could be obtained.