The present invention relates to a nucleic acid sequencing reagent based on a combinatorial array of 4-8 specific bases, combined with one or more random bases or other spacer molecules, and a xe2x80x9cgenericxe2x80x9d template capture moiety capable of binding to some common region of the template nucleic acid. Template nucleic acids, such as DNA amplified by PCR, can be sequenced or scanned for mutations using this array configuration through primer extension with labeled ddNTPs. This array can also be used to sequence templates without prior knowledge (de novo) of the wild type or xe2x80x9cexpectedxe2x80x9d sequence.
I. Nucleic Acid Sequencing
Initial attempts to determine the sequence of a DNA molecule were extensions of techniques which had been initially developed to permit the sequencing of RNA molecules (Sanger, F., J. Mol. Biol. 13:373 (1965); Brownlee, G. G. et al., J. Mol. Biol. 34:379 (1968)). Such methods involved the specific cleavage of DNA into smaller fragments by (1) enzymatic digestion (Robertson, H. D. et al., Nature New Biol. 241:38 (1973); Ziff, E. B. et al., Nature New Biol. 241:34 (1973)); (2) nearest neighbor analysis (Wu, R., et al., J. Mol. Biol. 57:491 (1971)), and (3) the xe2x80x9cWandering Spotxe2x80x9d method (Sanger. F., Proc. Natl. Acad. Sci. (U.S.A.) 70:1209 (1973)).
The most commonly used methods of nucleic acid sequencing are the dideoxy-mediated chain termination method, also known as the xe2x80x9cSanger Methodxe2x80x9d (Sanger, F., et al., J. Molec. Biol. 94:441 (1975); Prober. J. et al., Science 238:336-340 (1987)) and the xe2x80x9cchemical degradation method,xe2x80x9d also known as the xe2x80x9cMaxam-Gilbert methodxe2x80x9d (Maxam. A.M., et al., Proc. Natl. Acad. Sci. (U.S.A.) 74:560 (1977), both references herein incorporated by reference).
The Maxam-Gilbert method of DNA sequencing is a degradative method. In this procedure, a fragment of DNA is labeled at one end (or terminus) and partially cleaved in four separate chemical reactions, each of which is specific for cleaving the DNA molecule at a particular base (G or C) at a particular type of base (A/G, C/T, or A greater than C). As in the dideoxy method, the effect of such reactions is to create a set of nested molecules whose lengths are determined by the locations of a particular base along the length of the DNA molecule being sequenced. The nested reaction products are then resolved by electrophoresis, and the end-labeled molecules are detected, typically by autoradiography when a 32P label is employed. Four single lanes are typically required in order to determine the sequence.
Significantly, in the Maxam-Gilbert method the sequence is obtained from the original DNA molecule, and not from an enzymatic copy. For this reason, the method can be used to sequence synthetic oligonucleotides, and to analyze DNA modifications such as methylation, etc. It can also be used to study both DNA secondary structure and protein-DNA interactions. Indeed, it has been readily employed in the identification of the binding sites of DNA binding proteins.
The Maxam-Gilbert method uses simple chemical reagents which are readily available. Nevertheless, the dideoxy-mediated method has several advantages over the Maxam-Gilbert method. The Maxam-Gilbert method is extremely laborious and requires meticulous experimental technique. In contrast, the Sanger method may be employed on larger nucleic acid molecules.
In the dideoxy-mediated or xe2x80x9cSangerxe2x80x9d chain termination method of DNA sequencing, the sequence of a DNA molecule is obtained through the extension of an oligonucleotide primer which is hybridized to the nucleic acid molecule being sequenced. In brief, four separate primer extension reactions are conducted. In each reaction a DNA polymerase is added along with the four nucleotide triphosphates needed to polymerize DNA. Each or these reactions is carried out in the additional presence of a 2xe2x80x2,3xe2x80x2 dideoxy derivative of the A, T, C, or G nucleotide triphosphate. Such derivatives differ from conventional nucleotide triphosphates in that they lack a hydroxyl residue at the 3xe2x80x2 position of deoxyribose. Thus, although they can be incorporated by a DNA polymerase into the newly synthesized primer extension, the absence of the 3xe2x80x2 a hydroxyl group causes them to be incapable of forming a phosphodiester bond with a succeeding nucleotide triphosphate. The incorporation of a dideoxy derivative results in the termination of the extension reaction.
Because the dideoxy derivatives are present in lower concentrations than their corresponding, conventional nucleotide triphosphate analogs, the net result of each of the four reactions is to produce a set of nested oligonucleotides each of which is terminated by the particular dideoxy derivative used in the reaction. By subjecting the reaction products of each of the extension reactions to electrophoresis, it is possible to obtain a series of four xe2x80x9cladders.xe2x80x9d Since the position of each xe2x80x9crungxe2x80x9d of the ladder is determined by the size of the molecule, and since such size is determined by the incorporation of the dideoxy derivative, the appearance and location of a particular xe2x80x9crungxe2x80x9d can be readily translated into the sequence of the extended primer. Thus, through an electrophoretic analysis, the sequence of the extended primer can be determined.
Methods for sequencing DNA using either the dideoxy-mediated method or the Maxam-Gilbert method are widely known to those of ordinary skill in the art. Such methods are, for example, disclosed in Maniatis, T. et al., Molecular Cloning, a Laboratory Manual 2nd Edition, Cold Spring Harbor Press, Cold Spring Harbor, N.Y. (1989), and in Zyskind, J. W. et al., Recombinant DNA Laboratory Manual, Academic Press, Inc., New York (1988), both of which are herein incorporated by reference.
Both the dideoxy-mediated method and the Maxam-Gilbert method of DNA sequencing require the prior isolation of the DNA molecule which is to be sequenced. The sequence information is obtained by subjecting the reaction products to electrophoretic analysis (typically using polyacrylamide gels). Thus, a sample is applied to a lane of a gel, and the various species of nested fragments are separated from one another by their migration velocity through the gel.
In response to the difficulties encountered in employing gel electrophoresis to analyze sequences, alternative methods have been developed. Existing methods for de novo sequencing on solid phase arrays consist primarily of hybridization of template nucleic acids to arrayed primers containing combinatorial sequences which hybridize to complementary sequences on the template strand. These methods combine the capture of the template, by formation of stable duplex structures, with sequence discrimination due to instability of mismatches between the template and the primer. Chetverin, A. B. et al. provides a general review of solid-phase oligonucleotide synthesis and hybridization techniques. Chetverin, A. B. et al., Bio/Technology 12:1093-1099(1994).
Macevicz (U.S. Pat. No. 5,002,867), for example, describes a method for determining nucleic acid sequence via hybridization with multiple mixtures of oligonucleotide probes. In accordance with this method, the sequence of a target polynucleotide is determined by permitting the target to sequentially hybridize with sets of probes having an invariant nucleotide at one position, and a variant nucleotides at other positions. The Macevicz method determines the nucleotide sequence of the target by hybridizing the target with a set of probes, and then determining the number of sites that at least one member of the set is capable of hybridizing to the target (i.e., the number of xe2x80x9cmatchesxe2x80x9d). This procedure is repeated until each member of a sets of probes has been tested.
Beattie, W. G. et al. has described a protocol for the preparation of terminal amine-derivatized 9-mer oligonucleotide arrays on ordinary microscope slides. Beattie, W. G. et al., Molec. Biotech. 4:21-225 (1995). These oligonucleotide arrays can hybridize DNA target strands of up to several hundred bases in length and can discriminate against mismatches.
Drmanac, R. T. has described a method for sequencing nucleic acid by hybridisation using nucleic acid segments on different sectors of a substrate and probes which discriminate between a one base mismatch. Drmanac, R. T. (EP 797683). Gruber, L. S. describes a method for screening a sample for the presence of an unknown sequence using hybridization sequencing. Gruber, L. S. (EP 787183).
In contrast to the xe2x80x9cSanger Methodxe2x80x9d and the xe2x80x9cMaxam-Gilbert method,xe2x80x9d which identify the sequence of all of the nucleotides of a target polynucleotide, xe2x80x9cmicrosequencingxe2x80x9d methods determine the identity of only a single nucleotide at a xe2x80x9cpredeterminedxe2x80x9d site. Such methods have particular utility in determining the presence and identity of polymorphisms in a target polynucleotide.
Because single nucleotide polymorphisms constitute sites of variation flanked by regions of invariant sequence, their analysis requires no more than the determination of the identity of the single nucleotide present at the site of variation, it is unnecessary to determine a complete gene sequence for each patient. Several methods have been developed to facilitate the analysis of such single nucleotide polymorphisms.
Several primer-guided nucleotide incorporation procedures for assaying polymorphic sites in DNA have been described (Komher, J. S. et al., Nucl. Acids. Res. 17:7779-7784 (1989); Sokolov, B. P., Nucl. Acids Res. 18:3671 (1990); Syvxc3xa4nen, A.-C., et al., Genomics 8:684-692 (1990); Kuppuswamy, M. N. et al., Proc. Natl. Acad. Sci. (U.S.A.) 88,:1143-1147 (1991); Prezant, T. R. et al., Hum. Mutat. 1:159-164 (1992); Ugozzoli, L. et al., GATA 9:107-112 (1992); Nyrxc3xa9n, P. et al., Anal. Biochem. 208:171-175 (1993): and Wallace, W089/10414). These methods differ from GBA(trademark) in that they all rely on the incorporation of labeled deoxynucleotides to discriminate between bases at a polymorphic site. In such a format, since the signal is proportional to the number of deoxynucleotides incorporated, polymorphisms that occur in runs of the same nucleotide can result in signals that are proportional to the length of the run (Syvxc3xa4nen. A.-C. et al., Amer. J. Hum. Genet. 52:46-59 (1993)). Such a range of locus-specific signals could be more complex to interpret, especially for heterozygotes, compared to the simple ternary (2:0, 1:1, or 0:2) class of signals produced by the GBA(trademark) method. In addition, for some loci, incorporation of an incorrect deoxynucleotide can occur even in the presence of the correct dideoxynucleotide (Komher, J. S. et al., Nucl. Acids. Res. 17:7779-7784 (1989)). Such deoxynucleotide misincorporation events may be due to the Km of the DNA polymerase for the mispaired deoxy- substrate being comparable, in some sequence contexts, to the relatively poor Km of even a correctly base paired dideoxy- substrate (Kornberg, A. et al., In: DNA Replication, Second Edition (1992), W. H. Freeman and Company, New York; Tabor, S. et al., Proc. Natl. Acad. Sci. (U.S.A.) 86:4076-4080 (1989)). This effect would contribute to the background noise in the polymorphic site interrogation.
The GBA(trademark) Genetic Bit Analysis method disclosed by Goelet, P. et al. (WO 99/15712, herein incorporated by reference) is a particularly useful microsequencing method. In GBA(trademark), the nucleotide sequence information surrounding a predetermined site of interrogation is used to design an oligonucleotide primer that is complementary to the region immediately adjacent to, but not including, the predetermined site. The target DNA template is selected from the biological sample and hybridized to the interrogating primer. This primer is extended by a single labeled dideoxynucleotide using DNA polymerase in the presence of at least two, and most preferably all four chain terminating nucleotide triphosphate precursors.
Mundy, C. R. (U.S. Pat. No. 4,656,127) discusses alternative microsequencing methods for determining the identity of the nucleotide present at a particular polymorphic site. Mundy""s method employs a specialized exonuclease-resistant nucleotide derivative. A primer complementary to the allelic sequence immediately 3xe2x80x2- to the polymorphic site is permitted to hybridize to a target molecule obtained from a particular animal or human. If the polymorphic site on the target molecule contains a nucleotide that is complementary to the particular exonucleotide-resistant nucleotide derivative present, then that derivative will be incorporated by a polymerase onto the end of the hybridized primer. Such incorporation renders the primer resistant to exonuclease, and thereby permits its detection. Since the identity of the exonucleotide-resistant derivative of the sample is known, a finding that the primer has become resistant to exonucleases reveals that the nucleotide present in the polymorphic site of the target molecule was complementary to that of the nucleotide derivative used in the reaction. Mundy""s method has the advantage that it does not require the determination of large amounts of extraneous sequence data. It has the disadvantages of destroying the amplified target sequences and unmodified primer and of being extremely sensitive to the rate of polymerase incorporation of the specific exonuclease-resistant nucleotide being used.
Cohen, D. et al. (French Patent 2,650,840: PCT Appln. No. W091/02087) discuss a solution-based method for determining the identity of the nucleotide of a polymorphic site. As in the Mundy method of U.S. Pat. No. 4,656,127, a primer is employed that is complementary to allelic sequences immediately 3xe2x80x2-to a polymorphic site. The method determines the identity of the nucleotide of that site using labeled dideoxynucleotide derivatives, which, if complementary to the nucleotide of the polymorphic site will become incorporated onto the terminus of the primer.
In contrast to the method of Cohen et al. (French Patent 2,650,840; PCT Appln. No. W091/02087), the GBA(trademark) method of Goelet, P. et al. can be conducted as a heterogeneous phase assay, in which the primer or the target molecule is immobilized to a solid phase. It is thus easier to perform, and more accurate than the method discussed by Cohen. The method of Cohen has the significant disadvantage of being a solution-based extension method that uses labeled dideoxynucleoside triphosphates. In the Cohen method, the target DNA template is usually prepared by a DNA amplification reaction, such as the PCR, that uses a high concentration of deoxynucleoside triphosphates, the natural substrates of DNA polymerases. These monomers will compete in the subsequent extension reaction with the dideoxynucleoside triphosphates. Therefore, following the PCR reaction, an additional purification step is required to separate the DNA template from the unincorporated dNTPs. Because it is a solution-based method, the unincorporated dNTPs are difficult to remove and the method is not suited for high volume testing.
Cheesman, P. (U.S. Pat. No. 5,302,509) describes a method for sequencing a single stranded DNA molecule using fluorescently labeled 3xe2x80x2-blocked nucleotide triphosphates. An apparatus for the separation, concentration and detection of a DNA molecule in a liquid sample has been recently described by Ritterband, et al. (PCT Patent Application No. W095/17676). Dower, W. J. et al. (U.S. Pat. No. 5,547,839) describes a method for sequencing an immobilized primer using fluorescent labels.
Chee, M. et al. (WO95/11995) describes an array of primers immobilized onto a solid surface. Chee et al. further describes a method for determining the presence of a mutation in a target sequence by comparing against a reference sequence with a known sequence.
An alternative approach, the xe2x80x9cOligonucleotide Ligation Assayxe2x80x9d (xe2x80x9cOLAxe2x80x9d) (Landegren, U. et al., Science 241:1077-1080 (1988)) has also been described as being capable of detecting single nucleotide polymorphisms. The OLA protocol uses two oligonucleotides which are designed to be capable of hybridizing to abutting sequences of a single strand of a target. One of the oligonucleotides is biotinylated, and the other is detectably labeled. If the precise complementary sequence is found in a target molecule, the oligonucleotides will hybridize such that their termini abut, and create a ligation substrate. Ligation then permits the labeled oligonucleotide to be recovered using avidin, or another biotin ligand. Nickerson, D. A. et al. have described a nucleic acid detection assay that combines attributes of PCR and OLA (Nickerson, D. A. et al., Proc. Natl Acad. Sci. (U.S.A) 87:8923-8927 (1990)). In this method, PCR is used to achieve the exponential amplification of target DNA, which is then detected using OLA. In addition to requiring multiple, and separate processing steps, one problem associated with such combinations is that they inherit all of the problems associated with PCR and OLA.
Boyce-Jacino et al. have described a method for sequencing a polynucleotide using nested GBA (U.S. patent application Ser. No. 08/616,906, herein incorporated by reference. In that method, an array of nested primer oligonucleotides is immobilized to a solid support. A target nucleic molecule is hybridized to the array of nested primer oligonucleotides and the hybridized array is sequenced using GBA.
Pastinen, T. et al. describes a method for the multiplex detection of mutations wherein the mutations are detected by extending immobilized primers, that anneal to the template sequences immediately adjacent to the mutant nucleotide positions, with a single labeled dideoxynucleotide using a DNA polymerase. Pastinen, T. et al., Genome Res. 7:606-614 (1997). In this method, the oligonucleotide arrays were prepared by coupling one primer per mutation to be detected on a small glass area. Pastinen, T. et al. has also described a method to detect multiple single nucleotide polymorphisms in an undivided sample. Pastinen, T. et al., Clin. Chem. 42:1191-1397 (1996). According to this method, the amplified DNA templates are first captured onto a manifold and then, with multiple minsequencing primers, single nucleotide extension reactions are carried out simultaneously with fluorescently labeled dideoxynucleotides.
Jalanko, A. et al. applied the solid-phase minisequencing method to the detection of a mutation causing cystic fibrosis. Jalanko, A. et al., Clin. Chem. 38:39-43 (1992). In the method of Jalanko et al., an amplified DNA molecule which is biotinylated at the 5xe2x80x2 terminus is bound to a solid phase and denatured. A detection primer, which hybridizes immediately before the mutation, is hybridized to the immobilized single stranded template and elongated with a single, labeled deoxynucleoside residue. Shumaker, J. M. et al. describes another solid phase primer extension method for mutation detection. Shumaker, J. M. et al., Hum. Mutation 7:346-354 (1996). In this method, the template DNA was annealed to an oligonucleotide array extended with 32P dNTPs and analyzed with a phosphoimager. The grid position of the oligonculeotide identified the mutation site and the extended base identified the mutation.
Caskey, C. et al. describes a method of analyzing a polynucleotide of interest using one or more sets of consecutive oligonucleotide primers differing within each set by one base at the growing end thereof. Caskey, C. et al., WO 95/00669. The oligonucleotide primers are extended with a chain terminating nucleotide and the identity of each terminating nucleotide is determined.
Existing methods for de novo sequencing on solid phase arrays consist primarily of hybridization of template nucleic acids to arrayed primers containing combinatorial sequences which hybridize to complementary sequences on the template strand. These methods combine the capture of the template with the specific hybridization function. Therefore, these primers are typically at least 12 bases in length (which contains over 16,000,000 different sequence combinations). Obviously, these arrays are very complex and time consuming to both construct and screen.
Therefore, it would be preferable to design a primer array system which separates the template capture function form the specific hybridization function of the arrayed primers to thereby simplify the array analysis. The present invention describes such a primer array system.
The present invention describes a novel nucleic acid sequencing reagent which consists of a capture moiety, a spacer region and a sequence specific hybridizing region of 4-8 bases. Preferably, the nucleic acid sequencing reagent is arranged into a nested array. This array configuration can then be used to sequence a given template without prior knowledge (de novo) of the wild type or expected sequence in conjunction with primer extension in the presence of a labeled chain terminating nucleotide.
The sequencing reagent of the present invention comprises:
i) a capture moiety which can form a stable complex with a region of a template nucleic acid molecule;
ii) a spacer region: and
iii) a sequence specific hybridizing region, wherein the sequence specific region comprises 4-8 bases which can hybridize to a complementary sequence on the template nucleic acid molecule.
Preferably, the sequencing reagent of the present invention further comprises a modification for attachment of the reagent to a solid surface. More preferably, the modification is at the 5xe2x80x2 terminus of the reagent.
Preferably, the sequencing reagents of the present invention are immobilized to a solid surface in an orderly array. More preferably, a plurality of unique sequencing reagents are organized into a combinatorial array. Such an array can be used to sequence a template nucleic acid even without prior knowledge (de novo) of the wild type or xe2x80x9cexpectedxe2x80x9d sequence.
Thus, the present invention also describes a method of sequencing a template nucleic acid molecule on a combinatorial array. The sequencing method employs the following steps:
A) immobilizing a sequencing reagent individually or in a group to a solid surface in a spatially distinct fashion, wherein the sequencing reagent comprises:
i) a capture moiety which can form a stable complex with a region of a template nucleic acid molecule;
ii) a spacer region; and
iii) a sequence specific hybridizing region, wherein the sequence specific region comprises 4-8 bases which can hybridize to a complementary sequence on the template nucleic acid molecule;
B) hybridizing the template nucleic acid to the capture moiety of the sequencing reagent:
C) hybridizing the sequence specific hybridizing region to a complementary region on the template strand:
D) extending the hybridized sequence specific region by a polymerase with a labeled chain terminating nucleotide:
E) determining the identity of the extended sequence by detecting the incorporated labeled chain terminating nucleotide.