1. Field of the Invention
This invention generally relates to the field of polymer and synthetic nucleic acids chemistry. In particular, the invention concerns the degradation and sequence determination of polymers, particularly peptide nucleic acids (PNAs) which sequentially eliminate terminal residues.
2. Description of the Background Art
Peptide nucleic acids (PNAs) are recently discovered synthetic polyamides which are promising candidates for the sequence-specific regulation of DNA expression and for the preparation of gene targeted drugs. See European Patent applications EP 92/01219 and 92/01220. PNAs are biopolymer hybrids which possess a peptide-like backbone to which the nucleobases of DNA are attached. Specifically, PNAs are synthetic polyamides comprised of repeating units of the amino acid, 2-aminoethylglycine to which adenine, cytosine, guanine and thymine are attached through a methylene carbonyl group. The polymers are neutral in charge and water soluble. Complementary sequences of PNA bind to DNA with high specificity to give both double and triple helical forms. Surprisingly, PNA-DNA hybrids are more stable than the native double stranded DNA. Consequently, PNAs are promising candidates for application to the multifaceted field of DNA manipulation. See Nielsen, Peter E., et al., Science 254: 1497-1500 (1991). There is presently no known technique for sequencing PNA once it has been synthesized.
DNA, polypeptides and proteins are naturally occurring biopolymers which can be routinely sequenced by well understood methods. Because PNA is a hybrid biopolymer possessing both nucleic acid and polypeptide-like structure, it is logical to evaluate DNA and protein sequencing methods for application to the sequencing of PNA.
DNA may be sequenced by either the Maxam & Gilbert or Sanger sequencing methods. See Stryer, L., Biochemistry., W. H. Freeman and Co., San Francisco (1981) pp. 591-93. Additionally, short DNA oligomers have been sequenced by the mobility-shift (wandering spot) method. See Gait, M. J. Oligonucleotide Synthesis, IRL Press, NY (1990) pp. 135-36. Polypeptides and proteins are sequenced from their amine terminus by Edman degradation. See Stryer, L., Biochemistry, W. H. Freeman and Co., San Francisco (1981) pp. 24-27. Further, several new methods have been described for carboxy terminal sequencing of polypeptides. See Inglis, A. S., Anal. Biochem. 195:183-96 (1991). Moreover, the sequencing of polypeptides has also been described by generating a nested (sequence defining) set of fragments followed by mass analysis. See Chait, Brian, T. et al., Science 257:1885-94 (1992). However, as discussed in more detail below, these techniques cannot be used to sequence PNA because it is a hybrid biopolymer containing a non-naturally occurring polyamide backbone.
DNA is a biopolymer comprised of a deoxyribose phosphate backbone to which adenine, cytosine, guanine and thymine are attached. Both the Maxam & Gilbert and Sanger sequencing methods require the generation of a nested set of polymer fragments which are separated and analyzed to determine sequence. However, the deoxyribose phosphate backbone does not sequentially degrade by any known chemical method. Consequently, the nested set of polymer fragments is generated by alternative methods.
"Nested set" is a term used in the art of sequencing to define a partially degraded/synthesized sample of polymer used to determine the polymer sequence. Ideally, the nested set contains measurable quantities of each of the possible fragments generated by a degradation/synthesis process whereby analysis of the set of fragments will define the sequence. Therefore, all of the fragments have a common terminus, but the opposite terminus is determined by the point of backbone cleavage or synthesis termination. Consequently, the relative difference in length/mass of all shortened polymers is dependent upon the point of backbone cleavage or synthesis termination.
The Maxam & Gilbert method uses chemical reagents to cleave the deoxyribose phosphate backbone thereby generating a nested set of fragments. Alternatively, Sanger uses enzymatic methods to synthesize the nested set. Using the DNA to be sequenced as a template, a primer is annealed and a polymerase reaction proceeds in four separate vessels. Each vessel contains a different dideoxy nucleotide which terminates chain extension upon incorporation thereby generating a nested set of polymer fragments. Finally, the mobility-shift (wandering spot) method for DNA sequencing involves the use of a phosphodiesterase enzyme which randomly cleaves the deoxyribose phosphate backbone, thereby generating a nested set of polymer fragments.
These methods cannot be adapted to sequencing PNA. Because PNA is comprised of a polyamide backbone, the chemical treatments described by Maxam & Gilbert will not cleave PNA. Likewise, PNA is not amenable to Sanger sequencing because it is not a substrate for the enzyme DNA polymerase. Moreover, PNA is not a substrate for the phosphodiesterase enzymes used to degrade DNA for mobility-shift (wandering spot) analysis. Thus, no known DNA sequencing method will generate a nested set of PNA fragments suitable for separation and analysis.
The related art of protein sequencing is also of no assistance in trying to sequence PNA. Proteins and polypeptides are polyamides formed from the 20 naturally occurring amino acids. The sequence may be determined using amino or carboxy terminal sequencing methods. Protein sequencing generally involves the chemically induced sequential removal and identification of the terminal amino acid residue. Because there is no requirement that a nested set of polymer fragments be generated, typical protein sequencing methods differ substantially from DNA sequencing techniques.
Edman degradation requires that the polyamide have a free amino group which is reacted with an isothiocyanate. See Stryer, L., Biochemistry, W. H. Freeman and Co., San Francisco (1981) pp. 24-27. The isothiocyanate is typically phenyl isothiocyanate. The adduct intramolecularly reacts with the nearest backbone amide group of the polymer thereby forming a five membered ring. This adduct rearranges and the terminal amino acid residue is then cleaved using strong acid. The released phenylthiohydantoin (PTH) of the amino acid is identified and the shortened polymer can undergo repeated cycles of degradation and analysis.
Carboxy terminal sequencing methods mimic Edman degradation but involve sequential degradation from the opposite end of the polymer. See Inglis, A. S., Anal. Biochem. 195:183-96 (1991). Like Edman degradation, the carboxy-terminal sequencing methods involve chemically induced sequential removal and identification of the terminal amino acid residue.
Though PNA contains a polyamide backbone, protein sequencing chemistries would not be expected to degrade the polymer. The PNA backbone is non-naturally occurring and the repeating unit is substantially different than a polypeptide backbone. Thus, the chemical methods employed for protein degradation and sequencing should not degrade or be useful to sequence PNA.
More recently, polypeptide sequencing has been described by preparing a nested set (sequence defining set) of polymer fragments followed by mass analysis. See Chait, B. T. et al., Science 257:1885-94 (1992). Sequence is determined by comparing the relative mass difference between fragments with the known masses of the amino acid residues. Though formation of a nested (sequence defining) set of polymer fragments is a requirement of DNA sequencing, this method differs substantially from the conventional protein sequencing method consisting of sequential removal and identification of each residue. Nonetheless, Edman chemistry is still required for the formation of the nested set. Consequently, PNA cannot be sequenced by this method because PNA will not sequentially degrade by Edman degradation.
Synthetic polymers also decompose by depolymerization (also known as "depropagation" or "chain unzipping"). See Rempp et al., Polymer Synthesis, Huthig and Wepf Verlag, New York, (1991) p. 202. The process is common to polymerizations occurring by a free radical mechanism. Depolymerization regenerates the monomer by the spontaneous reversal of the polymerization reaction. Though depolymerization is a stepwise degradation of the polymer, it is a random equilibrium process which is too rapid to control or manipulate for sequence analysis. Moreover, PN As in particular are not assembled by free radical polymerization and have not been shown to degrade by this mechanism.
Synthetic polymers undergo intramolecular reactions known as "backbiting", but these processes do not degrade the polymer. See Bovey et al., Macromolecules; an introduction to polymer science, Academic Press, New York, (1979), pp.48-49 & 238-41. During free radical polymerizations, backbiting occurs by the intramolecular transfer of a free radical from the terminus of the growing polymer chain to an internal atom. Chain propagation continues from this point and the polymer becomes branched. Nonetheless, since PNAs are not assembled by free radial polymerization they will not degrade by this mechanism.
Reshuffling of formed polymers may occur by other backbiting processes. See Rempp et al., Polymer Synthesis, Huthig and Wepf Verlag, New York, (1991) p. 51. Polyesters which possess a hydroxyl terminus will intramolecularly attack at the ester groups of the polymer backbone. Additionally, polyamides having amino termini will attack at amide groups of the polymer backbone. Backbiting under these conditions results in polymer reshuffling. Because no fragment is lost, the mass of a polymer is not altered. Moreover, backbiting has been observed in PNA. Intramolecular attack of the terminal amino group of PNA upon a side chain amide group has been described. See Christensen, L. et al., Optimized Solid-Phase Synthesis of PNA Oligomers. Thirteenth American Peptide Symposium, Poster No.7, Jun. 20-23, 1993, Edmonton Canada. Also See FIG. 1 of this application, attack 1. The result is a rearrangement known as an N-acyl shift. Though PNA does rearrange through backbiting, this process will not yield a sequencing method because the mass of the polymer remains unchanged.
Presently, sequence information for PNA is only available from the actual synthetic procedure. Methods have been described for the synthesis of PNA monomers. See the aforementioned European Patent applications EP 92/01219 and 92/01220. Using conventional boc/benzyl protection chemistries for peptide synthesis, polymers are assembled by the sequential reaction of the amino terminus of the growing polymer with the carboxyl terminus of the next monomer. See Gross, Erhard et al., The Peptide, Academic Press, NY (1980) pp 100-118. The completely assembled polymer is deprotected and the crude sample analyzed. Analysis is typically comprised of a separations step followed by mass spectrometry of the isolated fragments. Separations analysis (e.g. High Performance Liquid Chromatography (HPLC), Capillary Liquid Chromatography or Capillary Electrophoresis) yields the number of components and relative amounts of each. For shorter oligomers, these techniques may be used to determine the length (number of repeating monomer units) of the polymer. However, no sequence information is generated. Thus, if the polymer is improperly assembled such that it is of proper length but the order of assembly is incorrect, separations analysis may not detect this aberration. Moreover, separations analysis cannot absolutely identify an isolated polymer.
Mass spectrometry is very effective when used to analyze PNA isolated from a separations process because it gives the exact mass of the purified polymer. The desired product is identified because other components have a mass differing from the expected mass. However, polymers having the correct monomer composition but differing in sequence cannot be differentiated because they will have the same mass as the desired product. This situation would occur if the order of assembly was incorrect. Thus, it is impossible to confirm the sequence of a PNA using only mass spectrometry.
It is critical that the sequence of a PNA be known with certainty otherwise experimental results may be misleading. Studies show that single base pair mismatches affect PNA-DNA binding. See Egholm, M. et al., J. Am. Chem, Soc. 114:9677-78 (1992). Because errors can be made during the chemical assembly of the polymer, methods suitable for the absolute confirmation of the sequence of the assembled PNA are desirable. Additionally, as a therapeutic agent, PNA must be fully characterized prior to acceptance for use in humans. Consequently, a sequencing method for PNA is required because it will give absolute confirmation of primary polyamide structure.