The present invention provides aptamers that recognize and bind to guanosine (GMP), deoxyguanosine (dGMP), adenosine (AMP), deoxyadenosine (dAMP), cytosine (CMP) and deoxycytosine (dCMP). The present invention also relates to a method for sequencing a polymeric biomolecule and a method for structurally characterizing the same comprising the use of aptamers. In a preferred embodiment of this invention, these methods relate to the sequencing or characterization of a single polymeric biomolecule. The invention also relates to a method for selecting aptamers useful for sequencing nucleic acids.
Knowing the primary structure and composition of polymeric biomolecules, such as DNA, RNA, polysaccharides, lipids and polypeptides, is important for scientific and medical research and the development of medical treatments. For example, information regarding the primary structure of certain polymeric biomolecules is important for studying the genetic basis of certain diseases, understanding role that polysaccharides play in cellular recognition processes, determining the DNA sequence of a purified protein and producing recombinant proteins for assays for screening drugs. Thus, fast, accurate and efficient methods for determinating the primary structure and composition of a polymeric biomolecule, especially a biomolecule that is long and/or is in short supply, are important for progress in research.
1.1 DNA Sequencing
Approaches to sequencing DNA have varied widely. The Maxam-Gilbert technique for sequencing (Maxam and Gilbert, 1977, PNAS USA 74:560) involves four separate chemical cleavage reactions using the same DNA molecules. The partial or total cleavage of the DNAs, which are end-labeled, produce varying sized DNAs which are run on a gel electrophoresis apparatus. The sequence of the DNA molecule is determined from the migratory position of the bands in the gel. The dideoxy method of sequencing (Sanger et al., 1977, PNAS USA 74:5463) involves four enzymatic reactions using DNA polymerase to synthesize fragments of varying lengths due to the incorporation of a chain terminating dideoxy nucleotide into each fragment. Typically, radioactively-labeled nucleotide(s) are incorporated into the growing chains. Variations on the Sanger method comprise the use of fluorescent dye-labeled primers or nucleotide chain terminators. The reactions are then run on a gel electrophoresis apparatus. The sequence of the DNA molecule is determined from the migratory position of the cleaved bands in the gel. Fluorescence emissions from the dyes are monitored. These gel-based, ladder-like output methods are disadvantageous, in part, because they (1) require substantial amounts of template DNA for the reactions to occur, (2) produce a relatively small number of resolvable, visual fragments per reaction, (3) require time for the separation of the fragments and generation of the ladder, (4) require resequencing and overlapping sequencing reactions to determine the primary sequence of a long piece of DNA. A typical DNA sequencing as described above may yield the sequence of 300-500 nucleotides of a desired nucleic acid.
Alternatively, sequencing methods involving the use of an exonuclease to cleave off a terminal nucleotide of a single DNA molecule have been described. Jett et al. (U.S. Pat. No. 4,962,037) describes a method wherein a complementary strand of the DNA to be sequenced is synthesized with nucleotides covalently bonded to a fluorescent dye. Then, the labeled complementary strand of the desired DNA is sequenced using exonuclease cleavage. In practice, the exonuclease cleavage is hindered by the presence of dye on each nucleotide. Ishikawa (U.S. Pat. No. 5,528,046) describes the use of monoclonal antibodies against nucleotides A, G, T or C for detecting nucleotides freed from the DNA being sequenced. The monoclonal antibody in Ishikawa may be conjugated to a light emitting reagent, particularly a luminescent enzyme, to facilitate detection of the freed nucleotide. However, the use of monoclonal antibodies is disadvantageous, inter alia, because the production of monoclonal antibodies is labor intensive and requires considerable animal and cell culture resources for production and screening.
Thus, there is a need for alternative, sensitive methods for rapidly and accurately obtaining the nucleic acid sequence information. This is especially true for nucleic acid sequences that are long (greater than 1000 bp) and/or in short supply (less than nanomolar range).
1.2 Protein Sequencing
Chemical protein sequencing has been and continues to be one of the most popular methods for determining the primary structure of proteins. See Stolowitz, xe2x80x9cChemical Protein Sequencing and Amino Acid Analysis,xe2x80x9d Curr. Opin. Biotech. 4:9-13 (1993) and Hunkapiller, M. W., xe2x80x9cContemporary Methodology for the Determination of the Primary Structure of Proteins,xe2x80x9d Macromol. Seq. and Synthesis, Ed. D. H. Schlesinger, pp.45-58, Alan R. Liss: New York, N.Y. (1988).
Traditional chemical amino-terminal sequencing includes a degradation step such as Edman degradation and a detection step. Edman degradation typically includes a coupling step, a cleavage step, and a conversion step. For example, in an Edman degradation, the amino terminus of a target polypeptide is coupled to an isothiocyanate reagent and then the derivatized N-terminal amino acid is cleaved from the polypeptide with a strong organic acid. The reagents of the Edman process may be delivered to the target polypeptide in a vapor (gas-phase method) or in a liquid pulse (pulsed-liquid method). The target polypeptide may be covalently (e.g., with carbonyldiimidazole) or non-covalently (e.g., with polybrene) attached to a solid support. Solid supports used in protein sequencing include polyvinylidene difluoride (PVDF), glass beads or polystyrene beads. The cleaved amino acid is typically converted to a more stable phenylthiohydantoin (PTH) form by treatment with an aqueous solution of strong organic acid. The PTH amino acid may be detected, for example, by high pressure liquid chromatography (HPLC) with UV absorbance detectors or by mass spectrometry (Aebersold, R., et al., xe2x80x9cDesign, Synthesis, and Characterization of a Protein Sequencing Reagent Yielding Amino Acid Derivatives with Enhanced Detectability by Mass Spectrometry,xe2x80x9d Protein Science 1:494-503 (1992)).
In an alternative chemical sequencing method, the degradation step involves the thioacetylation of the amino-terminal amino acid, which is detected by gas chromatography/mass spectrometry (Stolowitz, M L et al., xe2x80x9cThioacetylation Method of Protein Sequencing: Gas Chromatography/Ion Trap Mass Spectrometric Detection of 5-acetoxy-2-Methylthiazoles,xe2x80x9d J. Protein Chem. 11:360-361 (1992)). In another chemical sequencing process, a peptide ladder generated by Edman degradation is analyzed using matrix-assisted, laser desorption, time-of-flight mass spectrometry (Chait, et al., xe2x80x9cProtein Ladder Sequencing,xe2x80x9d Science 262:89-92 (1993)).
Chemical cleavage of carboxy-terminal amino acids has been achieved through a variety of methods (Inglis, A. S., xe2x80x9cChemical Procedures for C-Terminal Sequencing of Peptides and Proteins,xe2x80x9d Analytical Biochemistry 195:183-196 (1991)). For example, the carboxy-terminus of a polypeptide has been coupled to a thiocyanate salt or thiocyanic acid (HSCN) to form a thiohydantoin or a peptidyl isothiocyanate which may be cleaved to form a thiohydantoin. The thiohydantoin-carboxy terminal amino acid can be detected by its UV absorption. Other carboxy-terminal cleavage reactions which do not involve the formation of a thiohydantoin can be characterized by the formation of (1) an acyl urea; (2) an O-peptidyl amino alcohol; (3) an N-peptidyl-2-oxazolidone; (4) an oxazole; and (5) an azide which is converted into an isocyanate. See, supra, Table 1 in Inglis.
Enzymatic digestion of terminal amino acids have been used to sequence polypeptides. Some amino-terminal and carboxy-terminal specific exopeptidases known in the art are carboxypeptidases (i.e. Y, A, B, and P), aminopeptidase 1, LAP, proline aminodipeptidase, leucine aminopeptidase, microsomal peptidase and cathepsin C. Serine carboxypeptidases have proven to be useful in sequentially cleaving residue by residue from the C-terminus of a protein or a peptide. Carboxypeptidase Y (CPY), in particular, is an attractive enzyme because it non-specifically cleaves all residues from the C-terminus, including proline. See, e.g., Breddam et al. (1987) Carlsburg Res. Commun. 52:55-63, U.S. Pat. No. 5,869,240 (Patterson); U.S. Pat. No. 5,792,664 (Chait et al.); and Tsugita et al. (1992) xe2x80x9cC-terminal Sequencing of Protein: A Novel Partial Acid Hydrolysis and Analysis by Mass Spectrometry,xe2x80x9d Eur. J. Biochem. 206:691-696.
The methods described above require at a minimum subfemtomole concentrations of polypeptide. They are also sensitive to the purity of the polypeptide sample. For example, the presence of a co-purifying protein contaminant during the sequencing of a target polypeptide may give rise to sequencing errors. Further, carryover of incomplete amino-terminal cleavage into the next cycle results in a steadily increasing proportion of a population of molecules being out of phase with the expected order of release. Finally, recovery and detection of the cleaved amino acid can be difficult under current methods.
Thus, there is a need for alternative, sensitive methods for rapidly and accurately obtaining the primary amino acid sequence information of polypeptides, especially for longer chain polypeptides and/or for polypeptides that are in short supply.
1.3 Polysaccharide Sequencing
Polysaccharides play an important role in the regulation of biological processes in every life form from bacteria to plants to mammals. For example, carbohydrate moieties in glycoproteins are have been shown to be involved in protein targeting, cell-cell recognition, and antigen-antibody reaction (J. C. Paulson, Trends Biochem. Sci., 14:272 (1989)).
Technologies for structurally characterizing target polysaccharides include the use of enzymes, gel permeation chromatography, high-performance anion exchange pulsed amperometric detection, electrospray or laser desorption mass spectrometry, capillary electrophoresis, hydrazinolysis, gas chromatography-mass spectrometry (GCMS), fast-atom bombardment and liquid secondary ion mass spectrometry and nuclear magnetic resonance (e.g., Geisow, M., xe2x80x9cShifting Gear in Carbohydrate Analysis,xe2x80x9d Bio/Technology 10:277-280). Methods for isolating and purifying polysaccharides from proteins or lipids are known (e.g., Welply, J., (1989) xe2x80x9cSequencing Methods for Carbohydrates and Their Biological Applications,xe2x80x9d TIBTECH 7:5-10; Pazur, J., xe2x80x9cNeutral Polysaccharides,xe2x80x9d Carbohydrate Analysis: A Practical Approach, 2nd Ed., Eds. M. F. Chaplain and J. F. Kennedy, Oxford University Press, Inc.: New York, 1994).
Techniques for determining the sequence of target polysaccharides include proton NMR, fast atom bombardment mass spectroscopy, antibody or lectin-binding to the polypeptide to confirm the presence of a particular oligonucleotide sequence, and enzymatic digestion. Exoglycosidases commonly used for oligosaccharide sequencing include mannosidases, hexosaminidases, galactosidases, fucosidase, neuraminidases, and glucosidases (e.g., A. Kobata, Anal. Biochem., 100:1-14 (1979)).
One approach to carbohydrate sequencing is sequential digestion of an oligosaccharide with an exoglycosidase of known specificity (e.g., A. Kobata, in Biology of Carbohydrates, vol. 2., Eds. V. Ginsburg et al., John Wiley and Sons: New York (1984); supra, A. Kobata, Anal. Biochem., 100:1-14 (1979)). For example, a tritiated polysaccharide would be digested with an exoglycosidase. The cleavage reaction would be monitored by comparing the uncleaved portion of the polysaccharide before and after exposure to the enzyme using paper chromatography, gel electrophoresis, and gel permeation chromatography. This technique is disadvantageous in that it requires the repeated isolation and determination of the oligosaccharide size before and after enzyme incubation. Consequently, this method requires much starting material and time and effort to isolate the uncleaved portion of oligossacharide.
Another method, the reagent array analysis method (RAAM), has been used to sequence polysaccharides (e.g., Prime, S and T. Merry, xe2x80x9cExoglysidase Sequencing of N-linked Glycans by the Reagent Array Analysis Method (RAAM),xe2x80x9d in Methods in Molecular Biology, vol. 76: Glycoanalysis Protocols, Ed., E. F. Hounsell, Humana Press Inc.: New Jersey (1998); C. T. Edge et al., PNAS USA 89:6338 (1992); U.S. Pat. No. 5,100,778 (Dwek et al.)). This method involves the digestion of an aliquot of target polypeptide with a defined mixture of exoglycosidases such that the polypeptide in each aliquot is digested up to a certain point. This is repeated with other aliquots of the polypeptide and different, defined mixtures of exoglycosidases. The uncleaved portion of the polypeptide in each aliquot is analyzed to identify the sequence of the original polysaccharide. Consequently, this method also requires much starting material and time and effort to isolate the uncleaved portion of the polysaccharide.
Thus, there is a need for alternative, sensitive methods for rapidly and accurately obtaining the primary monosaccharide sequence of polysaccharides, especially for longer chain polysaccharides and/or for polysaccharides samples which are limited in supply.
1.4 Aptamers
Aptamers are small single stranded RNAs or DNAs approximately 40-100 base pairs in length that form secondary and tertiary structures which bind to other biological molecules. Some aptamers having affinity to a specific protein, DNA, amino acid and nucleotides have been described (e.g., K. Y. Wang, et al., xe2x80x9cA DNA Aptamer Which Binds to and Inhibits Thrombin Exhibits a New Structural Motif for DNA,xe2x80x9d Biochemistry 32:1899-1904 (1993); Pitner et al., U.S. Pat. No. 5,691,145; Gold, et al., xe2x80x9cDiversity of Oligonucleotide Function,xe2x80x9d Ann. Rev. Biochem. 64: 763-97 (1995); Szostak et al., U.S. Pat. No. 5,631,146). High affinity and high specificity binding aptamers have been derived from combinatorial libraries (supra, Gold, et al.). Aptamers may have high affinities, with equilibrium dissociation constants ranging from micromolar to sub-nanomolar depending on the selection used. Aptamers may also exhibit high selectivity, for example, showing a thousand fold discrimination between 7-methylG and G (Haller, A. A., and Sarnow, P., xe2x80x9cIn Vitro Selection of a 7-Methyl-Guanosine Binding RNA That Inhibits Translation of Capped mRNA molecules, PNAS USA 94:8521-8526 (1997)) or between D and L-tryptophan (supra, Gold et al.).
General methods for screening randomized oligonucleotides for aptamer activity have been described. For example, Gold, et al. (U.S. Pat. No. 5,270,163) describes the xe2x80x9cSELEXxe2x80x9d (Systematic Evolution of Ligands by Exponential Enrichment) method. In Gold et al., a candidate mixture of single stranded nucleic acid having regions of randomized sequence is contacted with a target molecule. Those nucleic acids having an increased affinity to the target are partitioned from the remainder of the candidate mixture. The partitioned nucleic acids are amplified to yield a ligand enriched mixture. Szostak et al. (U.S. Pat. No. 5,631,146) describes a method for producing a single stranded DNA molecule which binds adenosine or an adenosine-5xe2x80x2-phosphate. In Szostak, aptamers with affinity for adenosine or adenosine-5xe2x80x2-phosphate are partitioned away from aptamers with less affinity using affinity column chromatography. The ATP column of Szostak has ATP linked to the agarose through the C8 carbon of the adenine base. The resulting selected aptamers are unable to recognize portions of the adenine base especially around the C8 region of the adenine base.
Aptamers with good specificity and affinity for adenosine and the bases of other nucleotides are useful, inter alia, for DNA and RNA sequencing according to the methods of this invention. Thus, there exists a need for a method for obtaining an improved selection of aptamers for sequencing and characterizing nucleic acid molecules.
The methods of this invention satisfy several objectives. They provide an alternative, highly sensitive and rapid method for sequencing a polymeric biomolecule of extended length that does not require labeling of the target polymeric biomolecule before sequencing and avoids the repeated isolation and analysis of uncleaved portions of a polymeric biomolecule of past sequencing methods. They provide a method for sequencing or characterizing a single polymeric biomolecule or an amount of polymeric biomolecule below subfemtomolar range.
The invention provides methods for sequencing a polymeric biomolecule comprising the steps of separating a terminal monomer from the polymeric biomolecule and identifying the separated terminal monomer using an aptamer. The separation step comprises using a cleaving reagent to catalyze the hydrolysis of the terminal monomer from the polymeric biomolecule. The polymeric biomolecule may be attached to a solid support. In a preferred embodiment of this invention, the cleaving agent is an enzyme such as an exonuclease, an exogylcosidase or an exopeptidase. In a preferred embodiment of this invention, the cleaved monomer is deposited onto a surface in a orderly manner for detection by the aptamer. In a more preferred embodiment of this invention, the surface onto which the monomer is deposited is a patterned surface with regions of differing hydrophilicity and/or is passivated against non-specific adsorption of the recognition molecules. In a preferred embodiment of this invention, the aptamer is labeled with an optically detectable species. Preferred polymeric biomolecules for use with the methods of this invention are DNA, RNA, polypeptides or polysaccharides. Particularly preferred biomolecules of this invention are polynucleotides.
The present invention provides an improved method for producing aptamers with strong binding affinity and selectivity for their target monomer comprising the steps of separating the desired aptamer from a mixture of aptamers by exposing the mixture of aptamers to an affinity system comprising the target monomer at low temperature, amplifying the aptamer that bound to the affinity system, and repeating the separation and amplification steps until the aptamer(s) having the desired affinity and selectivity are obtained. The low temperature is approximately a temperature between less than 10xc2x0 C. to above freezing point. In a preferred embodiment, the low temperature is closer to the freezing point. The method of selection of this invention is particularly useful for developing aptamers useful for sequencing and characterizing DNA according to the methods of this invention.
The present invention also provides a method for producing an aptamer for recognizing a target nucleotide or nucleoside comprising the step of separating the aptamer from a mixture of aptamers using an affinity system, wherein the affinity system comprises the target nucleotide or nucleoside attached to a solid support through the 5xe2x80x2 carbon of the sugar ring. According to a preferred embodiment of the invention the target nucleotide is attached to the solid support through the phosphate on the 5xe2x80x2 carbon of the sugar ring. In a further embodiment of this method, the separation step is carried out at low temperature, i.e., approximately a temperature between less than 10xc2x0 C. to above freezing point. In a preferred embodiment, the temperature is closer to the freezing point.
The invention provides a single-stranded nucleic acid molecule that recognizes and binds to AMP and dAMP. The invention also provides a single-stranded nucleic acid molecule that recognizes and binds to CMP and dCMP. This invention further provides a single-stranded nucleic acid molecule that recognizes and binds to GMP and dGMP. The invention also provides several specific nucleic molecules that recognize AMP, dAMP, CMP, dCMP, GMP or dGMP. In one preferred embodiment of the invention, the binding of the nucleic acid molecule to the nucleotide has a dissociation constant that is less than 3 xcexcM.