The present invention relates generally to organic chemistry, analytical chemistry, biochemistry, molecular biology, genetics, diagnostics and medicine. In particular, it relates to a method for analyzing polynucleotides; i.e., for determining the complete nucleotide sequence of a polynucleotide, for detecting variance in the nucleotide sequence between related polynucleotides and for genotyping DNA.
The following is offered as background information only and is not intended nor admitted to be prior art to the present invention.
DNA is the carrier of the genetic information of all living cells. An organism""s genetic and physical characteristics, its genotype and phenotype, respectively, are controlled by precise nucleic acid sequences in the organism""s DNA. The sum total of all of the sequence information present in an organism""s DNA is termed the organism""s xe2x80x9cgenome.xe2x80x9d The nucleic acid sequence of a DNA molecule consists of a linear polymer of four xe2x80x9cnucleotides.xe2x80x9d The four nucleotides are tripartite molecules, each consisting of (1) one of the four heterocyclic bases, adenine (abbreviated xe2x80x9cAxe2x80x9d), cytosine (xe2x80x9cCxe2x80x9d), guanine (xe2x80x9cGxe2x80x9d) and thymine (xe2x80x9cTxe2x80x9d); (2) the pentose sugar derivative 2-deoxyribose which is bonded by its 1-carbon atom to a ring nitrogen atom of the heterocyclic bases; and (3) a monophosphate monoester formed between a phosphoric acid molecule and the 5xe2x80x2-hydroxy group of the sugar moiety. The nucleotides polymerize by the formation of diesters between the 5xe2x80x2-phosphate of one nucleotide and the 3xe2x80x2-hydroxy group of another nucleotide to give a single strand of DNA. In nature, two of these single strands interact by hydrogen bonding between complementary nucleotides, A being complementary with T and C being complementary with G, to form xe2x80x9cbase-pairsxe2x80x9d which results in the formation of the well-known DNA xe2x80x9cdouble helixxe2x80x9d of Watson and Crick. RNA is similar to DNA except that the base thymine is replaced with uracil (xe2x80x9cUxe2x80x9d) and the pentose sugar is ribose itself rather than deoxyribose. In addition, RNA exists in nature predominantly as a single strand; i.e., two strands do not normally combine to form a double helix.
When referring to sequences of nucleotides in a polynucleotide, it is customary to use the abbreviation for the base; i.e., A, C, G, and T (or U) to represent the entire nucleotide containing that base. For example, a polynucleotide sequence denoted as xe2x80x9cACGxe2x80x9d means that an adenine nucleotide is bonded through a phosphate ester linkage to a cytosine nucleotide which is bonded through another phosphate ester linkage to a guanine nucleotide. If the polynucleotide being described is DNA, then it is understood that xe2x80x9cAxe2x80x9d refers to an adenine nucleotide which contains a deoxyribose sugar. If there is any possibility of ambiguity, the xe2x80x9cAxe2x80x9d of a DNA molecule can be designated xe2x80x9cdeoxyAxe2x80x9d or simply xe2x80x9cdA.xe2x80x9d The same is true for C and G. Since T occurs only in DNA and not RNA, there can be no amibiguity so there is no need to refer to deoxyT or dT.
As a rough approximation, it can be said that the number of genes an organism has is proportional to the organism""s phenotypic complexity; i.e., the number of genome products necessary to replicate the organism and allow it to function. The human genome, presently considered one of the most complex, consists of approximately 60,000-100,000 genes and about three billion three hundred million base pairs. Each of these genes codes for an RNA, most of which in turn encodes a particular protein which performs a specific biochemical or structural function. A variance, also known as a polymorphism or mutation, in the genetic code of any one of these genes may result in the production of a gene product, usually a protein or an RNA, with altered biochemical activity or with no activity at all. This can result from as little change as an addition, deletion or substitution (transition or transversion) of a single nucleotide in the DNA comprising a particular gene which is sometimes referred to as a xe2x80x9csingle nucleotide polymorphismxe2x80x9d or xe2x80x9cSNP. The consequence of such a mutation in the genetic code ranges from harmless to debilitating to fatal. There are presently over 6700 human disorders believed to have a genetic component. For example, hemophilia, Alzheimer""s disease, Huntington""s disease, Duchernne muscular dystrophy and cystic fibrosis are known to be related to variances in the nucleotide sequence of the DNA comprising certain genes. In addition, evidence is being amassed suggesting that changes in certain DNA sequences may predispose an individual to a variety of abnormal conditions such as obesity, diabetes, cardiovascular disease, central nervous system disorders, auto-immune disorders and cancer. Variations in DNA sequence of specific genes have also been implicated in the differences observed among patients in their responses to, for example, drugs, radiation therapy, nutritional status and other medical interventions. Thus, the ability to detect DNA sequence variances in an organism""s genome is an important aspect of the inquiry into relationships between such variances and medical disorders and responses to medical interventions. Once an association has been established, the ability to detect the variance(s) in the genome of a patient can be an extremely useful diagnostic tool. It may even be possible, using early variance detection, to diagnose and potentially treat, or even prevent, a disorder before the disorder has physically manifested itself. Furthermore, variance detection can be a valuable research tool in that it may lead to the discovery of genetic bases for disorders the cause of which were hitherto unknown or thought to be other than genetic. Variance detection may also be useful for guiding the selection of an optimal therapy where there is a difference in response among patients to one or more proposed therapies.
While the benefits of being able to detect variances in the genetic code are clear, the practical aspects of doing so are daunting: it is estimated that sequence variations in human DNA occur with a frequency of about 1 in 100 nucleotides when 50 to 100 individuals are compared. Nickerson, D. A., Nature Genetics, 1998, 223-240. This translates to as many as thirty million variances in the human genome. Not all, in fact very few, of these variances have any measurable effect on the physical well-being of humans. Detecting these 30 million variances and then determining which of them are relevant to human health is clearly a formidable task.
In addition to variance detection, knowledge of the complete nucleotide sequence of an organism""s genome would contribute immeasurably to the understanding of the organism""s overall biology, i.e., it would lead to the identification of every gene product, its organization and arrangement in the organism""s genome, the sequences required for controlling gene expression (i.e., production of each gene product) and replication. In fact, the quest for such knowledge and understanding is the raison d""etre for the Human Genome Project, an international effort aimed at sequencing the entire human genome. Once the sequence of a single genome is available, whatever the organism, it then becomes useful to obtain the partial or complete sequence of other organisms of that species, particularly those organisms within the species that exhibit different characteristics, in order to identify DNA sequence differences that correlate with the different characteristics. Such different characteristics may include, for microbial organisms, pathogenicity on the negative side or the ability to produce a particular polymer or to remediate pollution on the positive side. A difference in growth rate, nutrient content or pest resistance are potential differences which might be observed among plants. Even among human beings, a difference in disease susceptibility or response to a particular therapy might relate to a genetic, i.e., DNA sequence, variation. As a result of the enormous potential utility to be realized from DNA sequence information, in particular, identification of DNA sequence variances between individuals of the same species, the demand for rapid, inexpensive, automated DNA sequencing and variance detection procedures can be expected to increase dramatically in the future.
Once the DNA sequence of a DNA segment; e.g., a gene, a cDNA or, on a larger scale, a chromosome or an entire genome, has been determined, the existence of sequence variances in that DNA segment among members of the same species can be explored. Complete DNA sequencing is the definitive procedure for accomplishing this task. Thus, it is possible to determine the complete sequence of a copy of a DNA segment obtained from a different member of the species and simply compare that complete sequence to the one previously obtained. However, current DNA sequencing technology is costly, time consuming and, in order to achieve high levels of accuracy, must be highly redundant. Most major sequencing projects require a 5- to 10-fold coverage of each nucleotide to reach an acceptable error rate of 1 in 2,000 to 1 in 10,000 bases. In addition, DNA sequencing is an inefficient way to detect variances. For example, a variance between any two copies of a gene, for example when two chromosomes are being compared, may occur as infrequently as once in 1,000 or more bases. Thus, only a small portion of the sequence is of interest, that in which the variance exists. However, if full sequencing is employed, a tremendous number of nucleotides have to be sequenced to arrive at the desired information involving the aforesaid small portion. For example, consider a comparison of ten versions of a 3,000 nucleotide DNA sequence for the purpose of detecting, say, four variances among them. Even if only a 2-fold redundancy is employed (each strand of the double-stranded 3,000 nucleotide DNA segment from each individual is sequenced once), 60,000 nucleotides would have to be sequenced (10xc3x973,000xc3x972). In addition, it is more than likely that problem areas will be encountered in the sequencing requiring additional runs with new primers; thus, the project could engender the sequencing of as many as 100,000 nucleotides to determine four variances. A variety of procedures have been developed over the past 15 years to identify sequence differences and to provide some information about the location of the variant sites (Table 1). Using such a procedure, it would only be necessary to sequence four relatively short portions of the 3000 nt (nucleotide) sequence. Furthermore, only a few samples would have to be sequenced in each region because each variance produces a characteristic change (Table 1) so, if, for example, 22 of 50 samples exhibit a such a characteristic change with a variation detection procedure, then sequencing as few as four samples of the 22 would provide information on the other 18. The length of the segments that require sequencing could, depending on the variance detection procedure employed, be as short as 50xc3x97100 nt. Thus, the scale of the sequencing project could be reduced to: 4 (sites)xc3x9750 (nt per site)xc3x972 (strands from each individual)xc3x972 (individuals per site) or only about 800 nucleotides. This amounts to about 1% of the sequencing required in the absence of a preceding variance detection step.
As presently practiced, the technique for determining the full nucleotide sequence of a polynucleotide and that for detecting previously unknown variances or mutations in related polynucleotides ends up being the same; that is, even when the issue is the presence or absence of a single nucleotide variance between related polynucleotides, the complete sequences of at least a segment of the related polynucleotides is determined and then compared. The only difference is that a variance detection procedure such as those described in Table 1 may be employed as a first step to reduce the amount of complete sequencing necessary in the detection of unknown variances.
The two classical methods for carrying out complete nucleotide sequencing are the Maxam and Gilbert chemical procedure (Proc. Nat. Acad. Sci. USA, 74, 560-564 (1977)) and the Sanger, et al., chain-terminating procedure (Proc. Nat. Acad. Sci. USA, 74, 5463-5467 (1977)). The Maxam-Gilbert method of complete nucleotide sequencing involves end-labeling a DNA molecule with, for example, 32P, followed by one of two discrete reaction sequences involving two reactions each; i.e., four reactions overall. One of these reaction sequences involves the selective methylation of the purine nucleotides guanine (G) and adenine (A) in the polynucleotide being investigated which, in most instances, is an isolated naturally-occurring polynucleotide such as DNA. The N7 position of guanine methylates approximately five times as rapidly as the N3 position of adenine. When heated in the presence of aqueous base, the methylated bases are lost and a break in the polynucleotide chain occurs. The reaction is more effective with methylated guanine than with methylated adenine so, when the reaction product is subjected to electrophoresis on polyacrylamide gel plates, G cleavage ladders are predominant. Under acidic conditions, on the other hand, both methylated bases are removed effectively. Treatment by piperidine cleaves DNA at these abasic sites, generating sequencing ladders that correspond to A+G.
Thus, four chemical reactions followed by electrophoretic analysis of the resulting end-labeled ladder of cleavage products will reveal the exact nucleotide sequence of a DNA molecule. It is key to the Maxam-Gilbert sequencing method that only partial cleavage, on the order of 1-2% at each susceptible position, occurs. This is because electrophoresis separates fragments by size. To be meaningful, the fragments produced should represent, on the average, a single modification and cleavage per molecule. Then, when the fragments of all four reactions are aligned according to size, the exact sequence of the target DNA can be determined.
The Sanger method for determining complete nucleotide sequences consists of preparing four series of base-specifically chain-terminated labeled DNA fragments by enzymatic polymerization. As in the Maxam-Gilbert procedure, four separate reactions can be performed. In the Sanger method each of the four reaction mixtures contains the same oligonucleotide template (either a single- or a double-stranded DNA), the four nucleotides, A, G, C and T (one of which may be labeled), a polymerase and a primer, the polymerase and primer being present to effect the polymerization of the nucleotides into a complement of the template oligonucleotide. To one of the four reaction mixtures is added an empirically determined amount of the dideoxy derivative of one of the nucleotides. A small amount of the dideoxy derivative of one of the remaining three nucleotides is added to a second reaction mixture, and so on, resulting in four reaction mixtures each containing a different dideoxy nucleotide. The dideoxy derivatives, by virtue of their missing 3xe2x80x2-hydroxyl groups, terminates the enzymatic polymerization reaction upon incorporation into the nascent oligonucleotide chain. Thus, in one reaction mixture, containing, say, dideoxyadenosine triphosphate (ddATP), a series of oligonucleotide fragments are produced all ending in ddA which when resolved by electrophoresis produce a series of bands corresponding to the size of the fragment created up to the point that the chain-terminating ddA became incorporated into the polymerization reaction. Corresponding ladders of fragments can be obtained from each of the other reaction mixtures in which the oligonucleotide fragments end in C, G and T. The four sets of fragments create a xe2x80x9csequence ladder,xe2x80x9d each rung of which represents the next nucleotide in the sequence of bases comprising the subject DNA. Thus, the exact nucleotide sequence of the DNA can simply be read off the electrophoresis gel plate after autoradiography or computer analysis of chromatograms in the case of an automated DNA sequencing instrument. As mentioned above, dye-labelled chain terminating dideoxynucleotides and modified polymerases that efficiently incorporate modified nucleotides are an improved method for chain-terminating sequencing.
Both the Maxam-Gilbert and Sanger procedures have their shortcomings. They are both time-consuming, labor-intensive (particularly with regard to the Maxam-Gilbert procedure which has not been automated like the Sanger procedure), expensive (e.g., the most optimized versions of the Sanger procedure require very expensive reagents) and require a fair degree of technical expertise to assure proper operation and reliable results. Furthermore, the Maxam-Gilbert procedure suffers from a lack of specificity of the modification chemistry, which can result in artifactual fragments resulting in false ladder readings from the gel plate. The Sanger method, on the other hand, is susceptible to template secondary structure formation, which can cause interference in the polymerization reaction. This causes terminations of the polymerization at sights of secondary struction (called xe2x80x9cstopsxe2x80x9d) which can result in erroneous fragments appearing in the sequence ladder rendering parts of the sequence unreadable, although this problem is ameliorated by the use of dye labelled dideoxy terminator. Furthermore, both sequencing methods are is susceptible to xe2x80x9ccompressions,xe2x80x9d another result of DNA secondary structure which can affect fragment mobility during electrophoresis thereby rendering the sequence ladder unreadable or subject to erroneous interpretation in the vicinity of the secondary structure. In addition, both methods are plagued by uneven intensity of the ladder and by non-specific background interference. These concerns are magnified when the issue is variance detection. In order to discern a single nucleotide variance, the procedure employed must be extremely accurate, a xe2x80x9cmistakexe2x80x9d in reading one nucleotide can result in a false positive; i.e., an indication of a variance where none exists. Neither the Maxam-Gilbert nor the Sanger procedures are capable of such accuracy in a single run. In fact, the frequency of errors in a xe2x80x9cone passxe2x80x9d sequencing experiment is equal to or greater than 1%, which is on the order of ten times the frequency of actual DNA variances when any two versions of a sequence are compared. The situation can be ameliorated somewhat by performing multiple runs (usually in the context of a xe2x80x9cshotgunxe2x80x9d sequencing procedure) for each polynucleotide being compared, but this simply increases cost in terms of equipment, reagents, manpower and time. The high cost of sequencing becomes even less acceptable when one considers that it is often not necessary when looking for nucleotide sequence variances among related polynucleotides to determine the complete sequence of the subject polynucleotides or even the exact nature of the variance (although, as will be seen, in some instances even this is discernable using the method of this invention); detection of the variance alone may be sufficient.
While not avoiding all of the problems associated with the Maxam-Gilbert and Sanger procedures, several techniques have been devised to at least make one or the other of the procedures more efficient. One such approach has been to develop ways to circumvent slab gel electrophoresis, one of the most time-consuming steps in the procedures. For instance, in U.S. Pat. Nos. 5,003,059 and 5,174,962, the Sanger method is employed; however, the dideoxy derivative of each of the nucleotides used to terminate the polymerization reaction is uniquely tagged with an isotope of sulfur, 32S, 33S, 34S or 36S. Once the polymerization reactions are complete, the chain terminated sequences are separated by capillary zone electrophoresis, which, compared to slab gel electrophoresis, increases resolution, reduces run time and allows analysis of very small samples. The separated chain terminated sequences are then combusted to convert the incorporated isotopic sulfur to isotopic sulfur dioxides (32SO2, 33SO2, 34SO2 and 36SO2). The isotopic sulfur dioxides are then subjected to mass spectrometry. Since each isotope of sulfur is uniquely related to one of the four sets of base-specifically chain terminated fragments, the nucleotide sequence of the subject DNA can be determined from the mass spectrogram.
Another method, disclosed in U.S. Pat. No. 5,580,733, also incorporates the Sanger technique but eliminates gel electrophoresis altogether. The method involves taking each of the four populations of base-specific chain-terminated oligonucleotides from the Sanger reactions and forming a mixture with a visible laser light absorbing matrix such as 3-hydroxypicolinic acid. The mixtures are then illuminated with visible laser light and vaporized, which occurs without further fragmentation of the chain-terminated nucleic acid fragments. The vaporized molecules, which are charged, are then accelerated in an electric filed and the mass to charge (m/z) ratio of the ionized molecules determined by time-of-flight mass spectrometry (TOF-MS). The molecular weights are then aligned to determine the exact sequence of the subject DNA. By measuring the mass difference between successive fragments in each of the mixtures, the lengths of fragments terminating in A, G, C or T can then be inferred. A significant limitation of current MS instruments is that polynucleotide fragments greater than 100 nucleotides in length (with many instruments, 50 nucleotides) cannot be efficiently detected in routine use, especially if the fragments are part of a complex mixture. This severe limitation on the size of fragments that can be analyzed has limited the development of polynucleotide analysis by MS. Thus, there is a need for a procedure that adapts large polynucleotides, such as DNA, to the capabilities of current MS instruments. The present invention provides such a procedure.
A further approach to nucleotide sequencing is disclosed in U.S. Pat. No. 5,547,835. Again, the starting point is the Sanger sequencing strategy. The four base specific chain- terminated series of fragments are xe2x80x9cconditionedxe2x80x9d by, for example, purification, cation exchange and/or mass modification. The molecular weights of the conditioned fragments are then determined by mass spectrometry and the sequence of the starting nucleic acid is determined by aligning the base-specifically terminated fragments according to molecular weight.
Each of the above methods involves complete Sanger sequencing of a polynucleotide prior to analysis by mass spectrometry. To detect genetic mutations; i.e., variances, the complete sequence can be compared to a known nucleotide sequence. Where the sequence is not known, comparison with the nucleotide sequence of the same DNA isolated from another of the same organisms which does not exhibit the abnormalities seen in the subject organism will likewise reveal mutations. This approach, of course, requires running the Sanger procedure twice; i.e., eight separate reactions. In addition, if a potential variance is detected, the entire procedure would in most instances be run again, sequencing the opposite strand using a different primer to make sure that a false positive had not been obtained. When the specific nucleotide variance or mutation related to a particular disorder is known, there are a wide variety of known methods for detecting a variance without complete sequencing. For instance, U.S. Pat. No. 5,605,798 describes such a method. The method involves obtaining a nucleic acid molecule containing the target sequence of interest from a biological sample, optionally amplifying the target sequence, and then hybridizing the target sequence to a detector oligonucleotide which is specifically designed to be complementary to the target sequence. Either the detector oligonucleotide or the target sequence is xe2x80x9cconditionedxe2x80x9d by mass modification prior to hybridization. Unhybridized detector oligonucleotide is removed and the remaining reaction product is volatilized and ionized. Detection of the detector oligonucleotide by mass spectrometry indicates the presence of the target nucleic acid sequence in the biological sample and thus confirms the diagnosis of the variance-related disorder.
Variance detection procedures can be divided into two general categories although there is a considerable degree of overlap. One category, the variance discovery procedures, is useful for examining DNA segments for the existence, location and characteristics of new variances. To accomplish this, variance discovery procedures may be combined with DNA sequencing.
The second group of procedures, variance typing (sometimes referred to as genotyping) procedures, are useful for repetitive determination of one or more nucleotides at a particular site in a DNA segment when the location of a variance or variances has previously been identified and characterized. In this type of analysis, it is often possible to design a very sensitive test of the status of a particular nucleotide or nucleotides. This technique, of course, is not well suited to the discovery of new variances.
As note above, Table 1 is a list of a number of existing techniques for nucleotide examination. The majority of these are used primarily in new variance determination. There are a variety of other methods, not shown, for gene typing. Like the Maxam-Gilbert and Sanger sequencing procedures, these techniques are generally time-consuming, tedious and require a relatively high skill level to achieve the maximum degree of accuracy possible from each procedure. Even then, some of the techniques listed are, even at their best, inherently less accurate than would be desirable.
The methods of Table 1, though primarily devised for variance discovery, can also be used when a variant nucleotide has already been identified and the goal is to determine its status in one or more unknown DNA samples (variance typing or genotyping). Some of the methods that have been developed specifically for genotyping include (1) primer extension methods in which dideoxynucleotide termination of the primer extension reaction occurs at the variant site generating extension products of different length or with different terminal nucleotides, which can then be determined by electrophoresis, mass spectrometry or fluorescence in a plate reader; (2) hybridization methods in which oligonucleotides corresponding to the two possible sequences at a variant site are attached to a solid surface and hybridized with probes from the unknown sample; (3) restriction fragment length polymorphism analysis, wherein a restriction endonuclease recognition site includes the polymorphic nucleotide in such a manner that the site is cleavable with one variant nucleotide but not another; (4) methods such as xe2x80x9cTaqManxe2x80x9d involving differential hybridization and consequent differential 5xe2x80x2 endonuclease digestion of labelled oligonucleotide probes in which there is fluorescent resonance energy transfer (FRET) between two fluors on the probe that is abrogated by nuclease digestion of the probe; (5) other FRET based methods involving labelled oligonucleotide probes called molecular beacons which exploit allele specific hybridization; (6) ligation dependent methods that require enzymatic ligation of two oligonucleotides across a polymorphic site that is perfectly matched to only one of them; and, (7) allele specific oligonucleotide priming in a polymerase chain reaction (PCR). U. Landegren, et al., 1998, Reading Bits of Genetic Information: Methods for Single-nucleotide Polymorphism Analysis, Genome Research 8(8):769-76.
When complete sequencing of large templates such as the entire genome of a virus, a bacterium or a eukaryote (e.g., higher organisms including man) or the repeated sequencing of a large DNA region or regions from different strains or individuals of a given species for purposes of comparison is desired, it becomes necessary to implement strategies for making libraries of templates for DNA sequencing. This is because conventional chain terminating sequencing (i.e., the Sanger procedure) is limited by the resolving power of the analytical procedure used to create the nucleotide ladder of the subject polynucleotide. For gels, this resolving power is approximately 500-800 nt at a time. For mass spectrometry, the limitation is the length of a polynucleotide which can be efficiently vaporized prior to detection in the instrument. Although larger fragments have been analyzed by highly specialized procedures and instrumentation, presently this limit is approximately 50-60 nt. However, in large scale sequencing projects such as the Human Genome Project, xe2x80x9cmarkersxe2x80x9d (DNA segments of known chromosomal location whose presence can be relatively easily ascertained by the polymerase chain reaction (PCR) technique and which, therefore, can be used as a point of reference for mapping new areas of the genome) are currently about 100 kilobases (Kb) apart. The markers at 100 Kb intervals must be connected by efficient sequencing strategies. If the analytical method used is gel electrophoresis, then to sequence a 100 kb stretch of DNA would require hundreds of sequencing reactions. A fundamental question which must be addressed is how to divide up the 100 kB segment (or whatever size is being dealt with) to optimize the process; i.e., to minimize the number of sequencing reactions and sequence assembly work necessary to generate a complete sequence with the desired level of accuracy. A key issue in this regard is how to initially fragment the DNA in such a manner that the fragments, once sequenced, can be correctly reassembled to recreate the full length target DNA. Presently, two general approaches provide both sequence-ready fragments and the information necessary to recombine the sequences into the full-length target DNA: xe2x80x9cshotgun sequencingxe2x80x9d (see, e.g., Venter, J. C., et al., Science, 1998, 280:1540-1542; Weber, J. L. and Myers, E. W., Genome Research, 1997, 7:401-409; Andersson, B. et al., DNA Sequence, 1997, 7:63-70) and xe2x80x9cdirected DNA sequencingxe2x80x9d (see, e.g., Voss, H., et al., Biotechniques, 1993, 15:714-721; Kaczorowski, T., et al., Anal. Biochem., 1994, 221:127-135; Lodhi, M. A., et al., Genome Research, 1996, 6:10-18).
Shotgun sequencing involves the creation of a large library of random fragments or xe2x80x9cclonesxe2x80x9d in a sequence-ready vector such as a plasmid or phagemid. To arrive at a library in which all portions of the original sequence are relatively equally represented, DNA which is to be shotgun sequenced is often fragmented by physical procedures such as sonication which has been shown to produce nearly random fragmentation. Clones are then selected at random from the shotgun library for sequencing. The complete sequence of the DNA is then assembled by identifying overlapping sequences in the short (approx. 500 nt) shotgun sequences. In order to assure that the entire target region of the DNA is represented among the randomly selected clones and to reduce the frequency of errors (incorrectly assigned overlaps), a high degree of sequencing redundancy is necessary; for example, 7 to 10-fold. Even with such high redundancy, additional sequencing is often required to fill gaps in the coverage. Even then, the presence of repeat sequences such as Alu (a 300 base-pair sequence which occurs in 500,000-1,000,000 copies per haploid genome) and LINES (xe2x80x9cLong INterspersed DNA sequence Elementsxe2x80x9d which can be 7,000 bases long and may be present in as many as 100,000 copies per haploid genome), either of which may occur in different locations of multiple clones, can render DNA sequence re-assembly problematic. For instance, different members of these sequence families can be over 90% identical which can sometimes make it very difficult to determine sequence relationships on opposite sides of such repeats. FIGS. 14A-C illustrates the difficulties of the shotgun sequencing approach in a hypothetical 10 kb sequence modeled after the sequence reported in Martin-Gallardo, et al., Nature Genetics, (1992) 1:34-39.
Directed DNA sequencing, the second general approach, also entails making a library of clones, often with large inserts (e.g., cosmid, P1, PAC or BAC libraries). In this procedure, the location of the clones in the region to be sequenced is then mapped to obtain a set of clones that constitutes a minimum-overlap tiling path spanning the region to be sequenced. Clones from this minimal set are then sequenced by procedures such as xe2x80x9cprimer walkingxe2x80x9d (see, e.g., Voss, supra). In this procedure, the end of one sequence is used to select a new sequencing primer with which to begin the next sequencing reaction, the end of the second sequence is used to select the next primer and so on. The assembly of a complete DNA is easier by direct sequencing and less sequencing redundancy is required since both the order of clones and the completeness of coverage is known from the clone map. On the other hand, assembling the map itself requires significant effort. Furthermore, the speed with which new sequencing primers can be synthesized and the cost of doing so is often a limiting factor with regard to primer walking. While a variety of methods for simplifying new primer construction have aided in this process (see, e.g. Kaczorowski, et al. and Lodhi, et al., supra), directed DNA sequencing remains a valuable but often expensive and slow procedure.
Most large-scale sequencing projects employ aspects of both shotgun sequencing and directed sequencing. For example, a detailed map might be made of a large insert library (e.g., BACs) to identify a minimal set of clones which gives complete coverage of the target region but then sequencing of each of the large inserts is carried out by a shotgun approach; e.g., fragmenting the large insert and re-cloning the fragments in a more optimal sequencing vector (see, e.g., Chen, C. N., Nucleic Acids Research, 1996, 24:4034-4041). The shotgun and directed procedures are also used in a complementary manner in which specific regions not covered by an initial shotgun experiment are subsequently determined by directed sequencing.
Thus, there are significant limitations to both the shotgun and directed sequencing approaches to complete sequencing of large molecules such as that required in genomic DNA sequencing projects. However, both procedures would benefit if the usable read length of contiguous DNA was expanded from the current 500-800 nt which can be effectively sequenced by the Sanger method. For example, directed sequencing could be significantly improved by reducing the need for high-resolution maps which could be achieved by longer read lengths which in turn would permit greater distances between landmarks.
A major limitation of current sequencing procedures is the high error rate (Kristensen, T., et al, DNA Sequencing, 2:243-346, 1992; Kurshid, F. and Beck, S., Analytical Biochemistry, 208:138-143,1993; Fichant, G. A. and Quentin, Y., Nucleic Acid Research, 23:2900-2908,1995). It is well-known that many of the errors associated with the Maxam-Gilbert and Sanger procedures are systematic; i.e., the errors are not random; rather, they occur repeatedly. To avoid this, two mechanistically different sequencing methods may be used so that the systematic errors in one may be detected and thus corrected by the second and visa versa. Since a significant fraction of the cost of current sequencing methods is associated with the need for high redundancy to reduce sequencing errors, the use of two procedures can reduce the overall cost of obtaining highly accurate DNA sequence.
The production and/or chemical cleavage of polynucleotides composed of ribonucleotides and deoxyribonucleotides has been previously described. In particular, mutant polymerases that incorporate both ribonucleotides and deoxyribonucleotides into a polynucleotide have been described; production of mixed ribo- and deoxyribo-containing polynucleotides by polymerization has been described; and generation of sequence ladders from such mixed polynucleotides, exploiting the well known lability of the ribo sugar to chemical base, has been described.
The use of such procedures, however, have been limited to: (i) polynucleotides where one ribonucleotide and three deoxyribonucleotides are incorporated; (ii) cleavage at ribonucleotides is effected using chemical base, (iii) only partial cleavage of the ribonucleotide containing polynucleotides is pursued, and (iv) the utility of the procedure is confined to production of sequence ladders, which are resolved electrophoretically.
In addition, the chemical synthesis of polynucleotide primers containing a single ribonucleotide, which at a subsequent step is substantially completely cleaved by chemical base, has been reported. The size of a primer extension product is then determined by mass spectrometry or other methods.
It is clear from the foregoing that there exists a need for a simple, low cost, rapid, yet sensitive and accurate, method for analyzing polynucleotides such as, without limitation, DNA, to determine both complete nucleotide sequences and the presence of variance(s). Further, there is a need for methods to enable assembly of very long DNA sequences across repeat dense regions. The methods of the present invention fulfill each of these needs. In general, the present invention supplies new methods for genotyping, DNA sequencing and variance detection based on specific cleavage of DNA and other polynucleotides modified by enzymatic incorporation of chemically modified nucleotides.
Thus, in one aspect, this invention relates to a method for cleaving a polynucleotide, comprising:
a. replacing a natural nucleotide at substantially each point of occurrence in a polynucleotide with a modified nucleotide to form a modified polynucleotide wherein said modified nucleotide is not a ribonucleotide;
b. contacting said modified polynucleotide with a reagent or reagents which cleave(s) the modified polynucleotide at substantially each said point of occurrence.
In another aspect, this invention relates to the above-described method for use in detection of variance in nucleotide sequence in related polynucleotides by the additional steps of:
c. determining the masses of said fragments obtained from step b; and,
d. comparing the masses of said fragments with the masses of fragments expected from cleavage of a related polynucleotide of known sequence, or
e. repeating steps a-c with one or more related polynucleotides of unknown sequence and comparing the masses of said fragments of said polynucleotide with the masses of fragments obtained from the related polynucleotides.
A further aspect of this invention is the use of the first method above whereby the nucleotide sequence of a polynucleotide is determined, by the additional steps of:
c. determining the masses of said fragments obtained from step 1b;
d. repeating steps 1a, 1b and 1 c, each time replacing a different natural nucleotide in said polynucleotide with a modified nucleotide until each natural nucleotide in said polynucleotide has been replaced with a modified polynucleotide, each modified polynucleotide has been cleaved and the masses of the cleavage fragments have been determined; and,
e. constructing said nucleotide sequence of said polynucleotide from said masses of said first fragments.
Another aspect of this invention is the use of the first mentioned method above whereby a nucleotide known to contain a polymorpism or mutation is genotyped, by:
using as the natural nucleotide to be replaced, a nucleotide known to be involved in said polymorphism or mutation;
replacing the natural nucleotide by amplifying the portion of the polynucleotide using a modified nucleotide to form a modified polynucleotide;
cleaving the modified polynucleotide into fragments at each point of occurrence of the modified nucleotide;
analyzing the fragments to determine genotype.
In the method immediately above, analysis of the fragments by electrophoresis, mass spectrometry or FRET detection, is an aspect of this invention.
Another aspect of this invention is a method for cleaving a polynucleotide, comprising:
a. replacing a first natural nucleotide at substantially each point of occurrence in a polynucleotide with a modified nucleotide to form a once modified polynucleotide;
b. replacing a second natural nucleotide at substantially each point of occurrence in the once modified nucleotide with a second modified nucleotide to form a twice modified nucleotide; and,
c. contacting said twice modified polynucleotide with a reagent or reagents which cleave the twice modified polynucleotide at each point in said twice modified polynucleotide where said first modified nucleotide is followed immediately by, and linked by a phosphodiester or modified phosphodiester linkage to, said second modified nucleotide.
An aspect of this invention is, in the method immediately above, variance in nucleotide sequence of related polynucleotides is detected by the additional steps of:
d. determining the masses of said fragments obtained from step c;
e. comparing the masses of said fragments with the masses of fragments expected from cleavage of a related polynucleotide of known sequence, or
f. repeating steps a-d with one or more related polynucleotides of unknown sequence and comparing the masses of said fragments with masses of fragments obtained from cleavage of the related polynucleotides.
An aspect of this invention is a method for detecting variance in nucleotide sequence in related polynucleotides, comprising:
a. replacing three of four natural nucleotides at substantially each point of occurrence in a polynucleotide with three stabilizing modified nucleotides to form a modified polynucleotide having one remaining natural nucleotide;
b. cleaving said modified polynucleotide into fragments at substantially each point of occurrence of said one remaining natural nucleotide;
c. determining the masses of said fragments; and,
d. comparing the masses of said fragments with the masses of fragments expected from cleavage of a related polynucleotide of known sequence, or
e. repeating steps a-c with one or more related polynucleotides of unknown sequence and comparing the masses of said fragments with masses obtained from cleavage of the related polynucleotides.
Another aspect of this invention is, in the method immediately above, replacing the remaining natural nucleotide with a destabilizing modified nucleotide.
A further aspect of this invention is a method for detecting variance in nucleotide sequence in related polynucleotides, comprising:
a. replacing two or more natural nucleotides at substantially each point of occurrence in a polynucleotide with two or more modified nucleotides wherein each said modified nucleotide has a different cleaving characteristic from each other of said modified nucleotides, to form a modified polynucleotide;
b. cleaving said modified polynucleotide into first fragments at substantially each point of occurrence of a first of said two or more modified nucleotides;
c. cleaving said first fragments into second fragments at each point of occurrence of a second of said two or more modified nucleotides in said first fragments;
d. determining the masses of said first fragments and said second fragments; and,
e. comparing the masses of said first fragments and said second fragments with the masses of first fragments and second fragments expected from the cleavage of a related polynucleotide of known sequence, or
f. repeating steps a-d with one or more related polynucleotides of unknown sequence and comparing the masses of said first and second fragments with masses obtained from the cleavage of the related polynucleotides.
It is an aspect of this invention that, in the above method, the steps are repeated using a modified nucleotide obtained by replacing different pairs of natural nucleotides with modified nucleotides; that is, given four natural nucleotides, 1, 2, 3, and 4, replacing 1 and 3 in one experiment, 2 and 4 in another, 1 and 4 in yet another, 2 and 3 in another or 3 and 4 in a final experiment with modified nucleotides.
It is an aspect of this invention that the modified polynucleotides obtained by the methods just above can be cleaved in a mass spectrometer, in particular, a tandem mass spectrometer.
A further aspect of this invention is a method for determining nucleotide sequence in a polynucleotide, comprising:
a. replacing a natural nucleotide at a percentage of points of occurrence in a polynucleotide with a modified nucleotide to form a modified polynucleotide wherein said modified polynucleotide is not a ribonucleotide;
b. cleaving said modified polynucleotide into fragments at substantially each point of occurrence of said modified nucleotide;
c. repeating steps a and b, each time replacing a different natural nucleotide in said polynucleotide with a modified nucleotide; and,
d. determining the masses of said fragments obtained from each cleavage; and,
e. constructing said sequence of said polynucleotide from said masses, or
f. analyzing a sequence ladder obtained from the fragments in step c.
An aspect of this invention is a method for determining nucleotide sequence in a polynucleotide, comprising:
a. replacing a natural nucleotide at a first percentage of points of occurrence in a polynucleotide with a modified nucleotide to form a modified polynucleotide wherein said modified nucleotide is not a ribonucleotide;
b. cleaving said modified polynucleotide into fragments at a second percentage of said points of occurrence of said modified nucleotide such that the combination of said first percentage and said second percentage results in partial cleavage of said modified polynucleotide;
c. repeating steps a and b, each time replacing a different natural nucleotide in said polynucleotide with a modified nucleotide;
d. determining the masses of said fragments obtained from each cleavage reaction; and,
e. constructing said sequence of said polynucleotide from said masses or,
f. analyzing a sequence ladder obtained from said fragments from steps a and b.
An aspect of this invention is a method for determining nucleotide sequence in a polynucleotide, comprising:
a. replacing two or more natural nucleotides at substantially each point of occurrence in a polynucleotide with two or more modified nucleotides to form a modified polynucleotide;
b. separating said modified polynucleotide into two or more aliquots, the number of said aliquots being the same as the number of natural nucleotides replaced in step a; and,
c. cleaving said modified polynucleotide in each said aliquot into fragments at substantially each point of occurrence of a different one of said modified nucleotides such that each of said aliquots contains fragments from cleavage at a different modified nucleotide than each other said aliquot;
d. determining masses of said fragments; and,
e. constructing said nucleotide sequence from said masses; or,
f. cleaving said modified polynucleotide in each said aliquot into fragments at a percentage of points of occurrence of a different modified nucleotide such that each of said aliquots contains fragments from cleavage at a different modified nucleotide than each other said aliquot; and,
g. analyzing a sequence ladder obtained from said fragments in step f.
Furthermore, an aspect of this invention is a method for determining nucleotide sequence in a polynucleotide, comprising:
a. replacing a first natural nucleotide at a percentage of points of incorporation in a polynucleotide with a first modified nucleotide to form a first partially modified polynucleotide wherein said first modified nucleotide is not an ribonucleotide;
b. cleaving said first partially modified nucleotide into fragments using said cleaving procedure of known cleavage efficiency to form a first set of nucleotide specific cleavage products;
c. repeating steps a and b replacing a second, a third and a fourth natural nucleotide with a second, third and fourth modified nucleotide to form a second, third and fourth partially modified polynucleotide which, upon cleavage, afford a second, third and fourth set of nucleotide specific cleavage products;
d. performing gel electrophoresis on said first, second, third and fourth set of nucleotide specific cleavage products to form a sequence ladder; and,
e. reading said sequence of said polynucleotide from said sequence ladder.
As aspect of this invention is a method for cleaving a polynucleotide during polymerization, comprising: mixing together four different nucleotides, one or two of which are modified nucleotides; and,
two or more polymerases, at least one of which produces or enhances cleavage at points where said modified nucleotide is being incorporated or, if two modified nucleotides are used, at points wherein said adjacent pair of modified nucleotides are being incorporated and are a proper spatial relationship; provided that, when only one modified nucleotide is used, it does not contain ribose as its only modifying characteristic.
In the method just above, when two modified nucleotides are used, it is an aspect of this invention that one of them is a ribonucleotide and one of them is a 5xe2x80x2-amino-2xe2x80x2,5xe2x80x2-dideoxynucleotide.
Furthermore, in the method just above using the specific modified nucleotides, it is an aspect of this invention to use two polymerases, one being Klenow (exo-) polymerase and one being mutant E710A Klenow (exo-) polymerase.
In any of the above methods, it is an aspect of this invention that all natural nucleotides not being replaced with modified nucleotides can be replaced with mass-modified nucleotides.
It is also an aspect of all methods of this invention that the polynucleotide being modified is selected from the group consisting of DNA and RNA.
Another aspect of all of the above methods is detection of said masses of said fragments by mass spectrometry. Presently preferred types of mass spectrometry are electrospray ionization mass spectrometry and matrix assisted desorption/ionization mass spectrometry (MALDI).
In the above methods requiring the generation of a sequence ladder, such generation can be accomplished using gel electrophoresis.
Furthermore, in the above method relating to determining a polynucleotide sequence by partially replacing a natural nucleotide with a modified nucleotide, cleaving said first, second, third and fourth partially modified polynucleotide obtained in step xe2x80x9caxe2x80x9d with one or more restriction enzymes, labeling the ends of the restriction fragments obtained, and purifying the restriction fragments, prior to performing step xe2x80x9cbxe2x80x9d is another aspect of this invention.
An aspect of this invention is a method for cleaving a polynucleotide such that substantially all fragments obtained from the cleavage carry a label, comprising:
a. replacing a natural nucleotide partially or at substantially each point of occurrence in a polynucleotide with a modified nucleotide to form a modified polynucleotide;
b. contacting, in the presence of a phosphine covalently bonded to a label, said modified polynucleotide with a reagent or reagents which cleave(s) the modified polynucleotide partially or at substantially each said point of occurrence.
In a presently preferred embodiment of this invention, the phosphine in the above method is tris(carboxyethyl) phosphine (TCEP).
Also in the method just above, the label is a fluorescent tag or a radioactive tag in another aspect of this invention.
It is an aspect of this invention that the above methods can be used for diagnosing a genetically-related disease. The methods can also be used as a means for obtaining a prognosis of a genetically related disease or disorder. They can also be used to determine if a particular patient is eligible for medical treatment by procedures applicable to genetically related diseases or disorders.
An aspect of this invention is a method for detecting a variance in nucleotide sequence in a polynucleotide, for sequencing a polynucleotide or for genotyping a polynucleotide known to contain a polymorphism or mutation:
a. replacing one or more natural nucleotides in said polynucleotide with one or more modified nucleotides, one or more of which comprises a modified base;
b. contacting said modified polynucleotide with a reagent or reagents which cleave the modified polynucleotide into fragments at site(s) of incorporation of said modified nucleotide;
c. analyzing said fragments to detect said variance, to construct said sequence or to genotype said polynucleotide.
The modified base in the above method can be adenine in another aspect of this invention. It can also be 7-deaza-7-nitroadenine.
A polynucleotide modified as above can be cleaved into fragments by contact with chemical base in another aspect of this invention.
In the above method, cleaving said modified polynucleotide into fragments comprises contacting said modified polynucleotide with a phosphine in yet another aspect of this invention.
Using TCEP as the phosphine in the above method is another aspect of this invention.
The modified base in the above method can also be modified cytosine such as, without limitation, azacytosine or cytosine substituted at the 5-position with an electron withdrawing group wherein the electron withdrawing group is, also without limitation, nitro or halo.
Once again, polynucleotides modified as noted just above can be cleaved with chemical base.
Inclusion of TCEP in the cleaving reaction immediately above is another aspect of this invention.
The modified base in the above method can also be modified guanine such as, without limitation, 7-methyl-guanine and cleavage can be carried out with chemical base.
The modified guanine is N2-allylguanine in a further aspect of this invention. Cleaving this modified guanine by contacting said modified polynucleotide with an electrophile, such as, without limitation, iodine, is another aspect of this invention.
In another aspect of this invention, the modified base in the above method can also be modified thymine and modified uracil. A presently preferred embodiment of this invention is the use of 5-hydroxyuracil in place of either thymine or uracil. When 5-hydroxyuracil is used, cleavage is accomplished by:
a. contacting said polynucleotide with a chemical oxidant; and, then
b. contacting said polynucleotide with chemical base.
Another aspect of this invention is a method for detecting a variance in nucleotide sequence in a polynucleotide, sequencing a polynucleotide or genotyping a polynucleotide comprising replacing one or more natural nucleotides in said polynucleotide with one or more modified nucleotides, one or more of which comprises a modified sugar with the proviso that, when only one nucleotide is being replaced, said modified sugar is not ribose.
The modified sugar is a 2-ketosugar in a further aspect of this invention. The keto sugar can be cleaved with chemical base.
The modified sugar can also be arabinose which is also susceptible to chemical base.
The modified sugar can also be a sugar substituted with a 4-hydroxymethyl group which, likewise, renders a polynucleotide susceptible to cleavage with chemical base.
On the other hand, the modified sugar can be hydroxycyclopentane, in particular 1-hydroxy- or 2-hydroxycyclopentane. The hydroxycyclopentanes can also be cleaved with chemical base.
The modified sugar can be azidosugar, for example, without limitation, 2xe2x80x2-azido, 4xe2x80x2-azido or 4xe2x80x2-azidomethyl sugar. Cleaving an azido sugar can be accomplished in the presence of TCEP.
The sugar can also be substituted with a group capable of photolyzing to form a free radical such as, without limitation, a phenylselenyl or a t-butylcarboxy group. Such groups render the polynucleotide susceptible to cleavage with ultraviolet light.
The sugar can also be a cyanosugar. In a presently preferred embodiment, the cyanosugar is 2xe2x80x2-cyanosugar or 2xe2x80x3-cyanosugar. The cyanosugar-modified polynucleotides can be cleaved with chemical base.
A sugar substituted with an electron withdrawing group, such as, without limitation, fluorine, azido, methoxy or nitro in the 2xe2x80x2, 2xe2x80x3 or 4xe2x80x2 position of the modified sugar is another aspect of this invention. These modified sugars render the modified polynucleotide susceptible to cleavage with chemical base.
On the other hand, a sugar can be modified by inclusion of an electron-withdrawing element in the sugar ring. Nitrogen is an example of such a group. The nitrogen can replace the ring oxygen of the sugar or a ring carbon and the resultant modified sugar is cleavable with chemical base.
In yet another aspect of this invention, the modified sugar can be a sugar containing a mercapto group. The 2xe2x80x2 position of the sugar is a presently preferred embodiment, such a sugar being cleavable by chemical base.
In particular, the modified sugar can be a 5xe2x80x2-methylenyl-sugar, a 5xe2x80x2-keto-sugar or a 5xe2x80x2,5xe2x80x2-difluoro-sugar, all of which are cleavable with chemical base.
Another aspect of this invention is a method for detecting a variance in nucleotide sequence in a polynucleotide, sequencing a polynucleotide or genotyping a polynucleotide known to contain a polymorphism or mutation comprising replacing one or more natural nucleotides in said polynucleotide with one or more modified nucleotides, one or more of which comprises a modified phosphate ester.
The modified phosphate ester can be a phosphorothioate.
In one embodiment, the sulfur of the phosphorothioate is not covalently bonded to the sugar ring. In this case, cleaving said modified polynucleotide into fragments comprises:
a. contacting said sulfur of said phosphorothiolate with an alkylating agent; and,
b. then contacting said modified polynucleotide with chemical base.
In a presently preferred embodiment of this invention, the alkylating agent is methyl iodide.
In another aspect of this invention the phosphorothioate containing modified polynucleotide can be cleaved into fragments by contacting said sulfur of said phosphorothioate with P-mercaptoethanol in a chemical base such as, without limitation, sodium methoxide in methanol.
On the other hand, the sulfur atom of said phosphorothiolate can be covalently bonded to a sugar ring in another embodiment of this invention. Cleavage of a polynucleotide so modified can be carried out with chemical base.
The modified phosphate ester can also be a phosphoramidate. Cleavage of a phosphoramidate-containing polynucleotide can be performed using acid.
It is an aspect of this invention that the modified phosphate ester comprises a group selected from the group consisting of alkyl phosphonate and alkyl phosphorotriester wherein the alkyl group is preferably methyl. Such a modified polynucleotide can also be cleaved with acid.
Another aspect of this invention is a method for detecting a variance in nucleotide sequence in a polynucleotide, sequencing a polynucleotide or genotyping a polynucleotide known to contain a polymorphism or mutation, comprising replacing a first and a second natural nucleotide in said polynucleotide with a first and a second modified nucleotides such that said polynucleotide can be specifically cleaved at sites where the first modified nucleotide is followed immediately in the modified polynucleotide sequence by said second modified nucleotide.
In the above method, the first modified nucleotide is covalently bonded at its 5xe2x80x2 position to a sulfur atom of a phosphorothioate group and said second modified nucleotide, which is modified with a 2xe2x80x2-hydroxy group, is contiguous to, and 5xe2x80x2 of, said first modified nucleotide. This dinucleotide pair is cleavable with chemical base.
Also in the above method the first modified nucleotide can be covalently bonded at its 3xe2x80x2 position to a sulfur atom of a phosphorothioate group where said second modified nucleotide, which is modified with a 2xe2x80x2-hydroxy group, is contiguous to and 3xe2x80x2 of said first modified nucleotide. This modified nucleotide pair can also be cleaved with chemical base.
It is also an aspect of this invention that, in the above method, said first modified nucleotide is covalently bonded at its 5xe2x80x2 position to a first oxygen atom of a phosphorothioate group, said second modified nucleotide is substituted at its 2xe2x80x2 position with a leaving group and said second modified nucleotide is covalently bonded at its 3xe2x80x2 position to a second oxygen of said phosphorothioate group. Any leaving group can be used, fluorine, chlorine, bromine and iodine are examples. The polynucleotide so modified can be cleaved with chemical base. Sodium methoxide is an example, without limitation, of a useful chemical base.
In another embodiment of this invention, said first modified nucleotide is covalently bonded at its 5xe2x80x2 position to a first oxygen atom of a phosphorothioate group, said second modified nucleotide is substituted at its 4xe2x80x2 position with a leaving group and said second modified nucleotide is covalently bonded at its 3xe2x80x2 position to a second oxygen of said phosphorothioate group. Here, again, any good leaving group can be used of which fluorine, chlorine, bromine and iodine are non-limiting examples. These groups likewise render the modified polynucleotide susceptible to cleavage by chemical base such as, without limitation, sodium methoxide.
In a further embodiment of this invention, said first modified nucleotide is covalently bonded at its 5xe2x80x2 position to a first oxygen atom of a phosphorothioate group, said second modified nucleotide is substituted at its 2xe2x80x2 position with one or two fluorine atoms and said second modified nucleotide is covalently bonded at its 3xe2x80x2 position to a second oxygen of said phosphorothioate group. Such a modified polynucleotide can be cleaved by
a. contacting said modified polynucleotide with ethylene sulfide or xcex2-mercaptoethanol; and then,
b. contacting said modified polynucleotide with a chemical base such as, without limitation, sodium methoxide.
Another embodiment of this invention has said first modified nucleotide covalently bonded at its 5xe2x80x2 position to a first oxygen atom of a phosphorothioate group, said second modified nucleotide substituted at its 2xe2x80x2 position with a hydroxy group and said second modified nucleotide covalently bonded at its 3xe2x80x2 position to a second oxygen of said phosphorothioate group. Here, cleavage can be accomplished by:
a. contacting said modified polynucleotide with a metal oxidant; and then,
b. contacting said modified polynucleotide with a chemical base.
Non-limiting examples of metal oxidants are Cuxe2x80x2 and Fexe2x80x2xe2x80x3 and equally non-limiting examples of useful bases are dilute hydroxide, piperidine and dilute ammonium hydroxide.
It is also an embodiment of this invention that said first modified nucleotide is covalently bonded at its 5xe2x80x2 position to a nitrogen atom of a phosphoramidate group and said second modified nucleotide, which is modified with a 2xe2x80x2-hydroxy group, is contiguous to and 5xe2x80x2 of said first modified nucleotide. This type of modification renders the modified polynucleotide susceptible to acid cleavage.
A still further embodiment of this invention is one in which said first modified nucleotide is covalently bonded at its 3xe2x80x2 position to a nitrogen atom of a phosphoramidate group and said second modified nucleotide, which is modified with a 2xe2x80x2-hydroxy group, is contiguous to and 3xe2x80x2 of said first modified nucleotide. Again, such a substitution pattern is cleavable with acid.
It also may be that said first modified nucleotide is covalently bonded at its 5xe2x80x2 position to an oxygen atom of an alkylphosphonate or an alkylphosphorotriester group and said second modified nucleotide, which is modified with a 2xe2x80x2-hydroxy group, is contiguous to said first modified nucleotide. This alternative dinucleotide grouping is also cleavable with acid.
Another cleavable dinucleotide grouping is one in which said first modified nucleotide has an electron-withdrawing group at its 4xe2x80x2 position and said second modified nucleotide, which is modified with a 2xe2x80x2-hydroxy group, is contiguous to and 5xe2x80x2 of said first modified nucleotide. Again, cleavage can be accomplished by contact with acid.
Another aspect of this invention is a method for detecting a variance in nucleotide sequence in a polynucleotide, for sequencing a polynucleotide or for genotyping a polynucleotide known to contain a polymorphism or mutation comprising:
a. replacing one or more natural nucleotides in said polynucleotide with one or more modified nucleotides wherein each modified nucleotide is modified with one or more modifications selected from the group consisting of a modified base, a modified sugar and a modified phosphate ester, provided that, if only one modified nucleotide is used, said modified nucleotide is not a ribonucleotide;
b. contacting said modified polynucleotide with a reagent or reagents which cleave the modified polynucleotide into fragments at site(s) of incorporation of said modified nucleotide;
c. analyzing said fragments to detect said variance, to construct said sequence or to genotype said polynucleotide.
An aspect of this invention is compound having the chemical structure: 
wherein R1 is selected from the group consisting of: 
A compound having the chemical structure: 
wherein said xe2x80x9cBasexe2x80x9d is selected from the group consisting of cytosine, guanine, inosine and uracil is another aspect of this invention.
Another aspect of this invention is a compound having the chemical structure: 
wherein base is selected from the group consisting of cytosine, guanine, inosine and uracil.
A still further aspect of this invention is a compound having the chemical structure: 
wherein said xe2x80x9cBasexe2x80x9d is selected from the group consisting of adenine, cytosine, guanine, inosine, thymine and uracil.
A polynucleotide comprising a dinucleotide sequence selected from the group consisting of: 
wherein each xe2x80x9cBasexe2x80x9d is independently selected from the group consisting of adenine, cytosine, guaninine and thymine; W is an electron withdrawing group; X is a leaving group and R is an alkyl, preferrably a lower alkyl, group is also an aspect of this invention. The electron withdrawing group is selected from the group consisting of F, Cl, Br, I, NO2, Cxe2x89xa1N, xe2x80x94C(O)OH and OH in another aspect of this invention and, in a still further aspect, the leaving group is selected from the group consisting of Cl, Br, I and OTs.
An aspect of this invention is a method for synthesizing a polynucleotide comprising mixing a compound having the chemical structure: 
wherein R1 is selected from the group consisting of: 
with adenosine triphosphate, guanosine triphosphate, and thymidine or uridine triphosphate in the presence of one or more polymerases.
A method for synthesizing a polynucleotide comprising mixing a compound having the chemical structure: 
wherein R1 is selected from the group consisting of: 
with adenosine triphosphate, cytidine triphosphate and guanosine triphosphate in the presence of one or more polymerases is also an aspect of this invention.
A method for synthesizing a polynucleotide, comprising mixing a compound having the chemical structure: 
wherein R1 is selected from the group consisting of: 
with cytidine triphosphate, guanosine triphosphate, and thymidine triphosphate in the presence of one or more polymerases is a further aspect of this invention.
An aspect of this invention is a method for synthesizing a polynucleotide, comprising mixing a compound having the chemical structure: 
wherein R1 is selected from the group consisting of: 
with adenosine triphosphate, cytidine triphosphate and thymidine triphosphate in the presence of one or more polymerases.
Another aspect of this invention is a method for synthesizing a polynucleotide, comprising mixing a compound selected from the group consisting of:
a compound having the chemical structure: 
wherein said xe2x80x9cBasexe2x80x9d is selected from the group consisting of cytosine, guanine, inosine and uracil;
a compound having the chemical structure: 
wherein said xe2x80x9cBasexe2x80x9d is selected from the group consisting of adenine, cytosine, guanine, inosine and uracil; and
a compound having the chemical structure: 
wherein the xe2x80x9cBasexe2x80x9d is selected from the group consisting of adenine, cytosine, guanine or inosine, and thymine or uracil, with whichever three of the four nucleosides triphosphates, adenosine triphosphate, cytidine triphosphate, guanosine triphosphate and thymidine triphosphate, do not contain said base (or its substitute), in the presence of one or more polymerases.
Another aspect of this invention is a method for synthesizing a polynucleotide, comprising mixing one of the following pairs of compounds: 
wherein:
Base1 is selected from the group consisting of adenine, cytosine, guanine or inosine, and thymine or uracil;
Base2 is selected from the group consisting of the remaining three bases which are not Base,;
R3 is Oxe2x88x92xe2x80x94P(xe2x95x90O)(Oxe2x88x92)xe2x80x94Oxe2x80x94P(xe2x95x90O)(Oxe2x88x92)xe2x80x94Oxe2x80x94P(xe2x95x90O)(Oxe2x88x92)xe2x80x94Oxe2x80x94; and,
W is an electron withdrawing group;
X is leaving group;
a second W or X shown in parentheses on the same carbon atom means that a single W or X group can be in either position on the sugar or both W or both X groups can be present at the same time; and,
R is a lower alkyl group;
with whichever two of the four nucleoside triphosphates, adenosine triphosphate, cytidine triphosphate, guanosine triphosphate and thymidine triphosphate, do not contain base-1 or base-2 (or their substitutes), in the presence of one or more polymerases.
An aspect of this invention is a mutant polymerase which is capable of catalyzing the incorporation of a modified nucleotide into a polynucleotide wherein said modified nucleotide is not a ribonucleotide, said polymerase being obtained by a process comprising DNA shuffling in another aspect of this invention.
The DNA shuffling including process can comprise the following steps:
a. selecting one or more known polymerase(s);
b. performing DNA shuffling;
c. transforming shuffled DNA into a host cell;
d. growing host cell colonies;
e. forming a lysate from said host cell colony;
f. adding a DNA template containing a detectable reporter sequence, the modified nucleotide or nucleotides whose incorporation into a polynucleotide is desired and the natural nucleotides not being replaced by said modified nucleotide(s); and,
g. examining the lysate for the presence of the detectable reporter.
The DNA-shuffling including process can also comprise:
a. selecting a known polymerase or two or more known polymerases having different sequences or different biochemical properties or both;
b. performing DNA shuffling;
c. transforming said shuffled DNA into a host to form a library of transformants in host cell colonies;
d. preparing first separate pools of said transformants by plating said host cell colonies;
e. forming a lysate from each said first separate pool host cell colonies;
f. removing all natural nucleotides from each said lysate;
g. combining each said lysate with:
i. a single-stranded DNA template comprising a sequence corresponding to an RNA polymerase promoter followed by a reporter sequence;
ii. a single-stranded DNA primer complementary to one end of said template;
iii. the modified nucleotide or nucleotides whose incorporation into said polynucleotide is desired;
iv. each natural nucleotide not being replaced by said modified nucleotide or nucleotides;
h. adding RNA polymerase to each said combined lysate;
i. examining each said combined lysate for the presence of said reporter sequence;
j. creating second separate pools of transformants in host cell colonies from each said first separate pool of host cell colonies in which the presence of said reporter is detected;
k. forming a lysate from each said second separate pool of host cell colonies;
l. repeating steps g, h , l, j, k and l to form separate pools of transformants in host cell colonies until only one host cell colony remains which contains said polymerase; and,
m. recloning said polymerase from said one host cell colony into a protein expression vector.
A polymerase which is capable of catalyzing the incorporation of a modified nucleotide into a polynucleotide, wherein said modified nucleotide is not a ribonucleotide obtained by a process comprising cell senescence selection is another aspect of this invention.
The cell senscence selection process can comprise the following steps:
a. mutagenizing a known polymerase to form a library of mutant polymerases;
b. cloning said library into a vector;
c. transforming said vector into host cells selected so as to be susceptible to being killed by a selected chemical only when said cell is actively growing;
d. adding a modified nucleotide;
e. growing said host cells;
f. treating said host cells with said selected chemical;
g. separating living cells from dead cells; and,
h. isolating said polymerase or polymerases from said living cells.
Steps d to h of the above method can be repeated one or more times to refine the selection of the polymerase in another aspect of this invention.
The cell senescence procedure for obtaining a polymerase can also comprise the steps of:
a. mutagenizing a known polymerase to form a library of mutant polymerases;
b. cloning said library of mutant polymerases into a plasmid vector;
c. transforming with said plasmid vector bacterial cells that, when growing, are susceptible to an antibiotic,
d. selecting transfectants using said antibiotic;
e. introducing a modified nucleotide, as the corresponding nucleoside triphosphate, into the bacterial cells;
f. growing the cells;
g. adding an antibiotic which will kill bacterial cells that are actively growing;
h. isolating said bacterial cells;
i. growing said bacterial cells in fresh medium containing no antibiotic;
j. selecting live cells from growing colonies;
k. isolating said plasmid vector from said live cells;
l. isolating said polymerase; and,
m. assaying said polymerase.
Repeating steps c to k of the above process one or more additional times before proceeding to step I is another aspect of this invention.
A polymerase may also be obtained by a process comprising phage display.
The phage display process may comprise the steps of:
a. selecting a DNA polymerase;
b. expressing said polymerase in a bacteriophage vector as a fusion to a bacteriophage coat protein;
c. attaching an oligonucleotide to the surface of the phage;
d. forming a primer template complex either by addition of a second oligonucleotide complementary to the oligonucleotide of c or by formation of a self priming complex using intramolecular complementarity of the oligonucleotide of c;
e. performing a primer extension in the presence of the modified nucleotide or nucleotides whose incorporation into a polynucleotide is desired, and the natural nucleotides not being replaced by said modified nucleotide(s) where successful primer extension results in the presence of a detectable reporter sequence;
f. sorting the phage with the detectable reporter from those without the detectable reporter;
The detectable reporter sequence is formed by incorporation of one or more dye-labeled natural or modified nucleotides in the primer extension reaction in another aspect of this invention.
The indicated sorting procedure may comprise use of a fluorescence activated cell sorter in yet another aspect of this invention.
An aspect of this invention is that the detectable reporter in the above method is a restriction endonuclease cleavage site and the sorting procedure entails restriction endonuclease digestion.
That the polymerase obtained in the above methods be a thermostable polymerase is another aspect of this invention.
The polymerase obtained by any of the above methods wherein the modified nucleotide being incorporated is selected from the group consisting of:
a compound having the chemical structure: 
wherein R1 is selected from the group consisting of: 
a compound having the chemical structure: 
wherein said xe2x80x9cBasexe2x80x9d is selected from the group consisting of cytosine, guanine, inosine and uracil,
a compound having the chemical structure: 
wherein said xe2x80x9cBasexe2x80x9d is selected from the group consisting of adenine, cytosine, guanine, inosine and uracil;
a compound having the chemical structure: 
wherein said xe2x80x9cBasexe2x80x9d is selected from the group consisting of adenine, cytosine, guanine, inosine, thymine and uracil; and,
a compound selected from the group consisting of: 
wherein:
Base1 is selected from the group consisting of adenine, cytosine, guanine or inosine, and thymine or uracil;
Base2 is selected from the group consisting of the remaining three bases which are not Base,;
R3 is Oxe2x88x92xe2x80x94P(xe2x95x90O)(Oxe2x88x92)xe2x80x94Oxe2x80x94P(xe2x95x90O)(Oxe2x88x92)xe2x80x94Oxe2x80x94P(xe2x95x90O)(Oxe2x88x92)xe2x80x94Oxe2x80x94; and,
W is an electron withdrawing group or, by convention, when referring to incorporation into a polynucleotide, a short-hand for the nucleoside triphosphate which is the species which actually polymerizes in the presence of a polymerase;
X is leaving group;
a second W or X shown in parentheses on the same carbon atom means that a single W or X group can be in either position on the sugar or both W or both X groups can be present at the same time; and,
R is a lower alkyl group;
A final aspect of this invention is a kit, comprising:
one or more modified nucleotides;
one or more polymerases capable of incorporating said one or more modified nucleotides in a polynucleotide to form a modified polynucleotide; and,
a reagent or reagents capable of cleaving said modified polynucleotide at each point of occurrence of said one or more modified nucleotides in said polynucleotide.
As used herein, a xe2x80x9cchemical methodxe2x80x9d refers to a combination of one or more modified nucleotides and one or more reagents which, when the modified nucleotide(s) is incorporated into a polynucleotide by partial or complete substitution for a natural nucleotide and the modified polynucleotide is subjected to the reagent(s), results in the selective cleavage of the modified polynucleotide at the point(s) of incorporation of the modified nucleotide(s).
By xe2x80x9canalysisxe2x80x9d is meant either detection of variance in the nucleotide sequence among two or more related polynucleotides or, in the alternative, the determination of the full nucleotide sequence of a polynucleotide.
By xe2x80x9creagentxe2x80x9d is meant a chemical or physical force which causes the cleavage of a modified polynucleotide at the point of incorporation of a modified nucleotide in place of a natural nucleotide; such a reagent may be, without limitation, a chemical or combination of chemicals, normal or coherent (laser) visible or uv light, heat, high energy ion bombardment and irradiation. In addition, a reagent may consist of a protein such as, without limitation, a polymerase.
xe2x80x9cRelatedxe2x80x9d polynucleotides are polynucleotides obtained from genetically similar sources such that the nucleotide sequence of the polynucleotides would be expected to be exactly the same in the absence of a variance or there would be expected to be a region of overlap that, in the absence of a variance would be exactly the same, where the region of overlap is greater than 35 nucleotides.
A xe2x80x9cvariancexe2x80x9d is a difference in the nucleotide sequence among related polynucleotides. The difference may be the deletion of one or more nucleotides from the sequence of one polynucleotide compared to the sequence of a related polynucleotide, the addition of one or more nucleotides or the substitution of one nucleotide for another. The terms xe2x80x9cmutation,xe2x80x9d xe2x80x9cpolymorphismxe2x80x9d and xe2x80x9cvariancexe2x80x9d are used interchangeably herein. As used herein, the term xe2x80x9cvariancexe2x80x9d in the singular is to be construed to include multiple variances; i.e., two or more nucleotide additions, deletions and/or substitutions in the same polynucleotide. A xe2x80x9cpoint mutationxe2x80x9d refers to a single substitution of one nucleotide for another.
A xe2x80x9csequencexe2x80x9d or xe2x80x9cnucleotide sequencexe2x80x9d refers to the order of nucleotide residues in a nucleic acid.
As noted above, one aspect of the chemical method of the present invention consists of modified nucleotides which can be incorporated into an polynucleotide in place of natural nucleotides.
A xe2x80x9cnucleosidexe2x80x9d refers to a base linked to a sugar. The base may be adenine (A), guanine (G) (or its substitute, inosine (I)), cytosine (C), or thymine (T) (or its substitute, uracil (U)). The sugar may be ribose (the sugar of a natural nucleotide in RNA) or 2-deoxyribose (the sugar of a natural nucleotide in DNA).
A xe2x80x9cnucleoside triphosphatexe2x80x9d refers to a nucleoside linked to a triphosphate group:
Oxe2x88x92xe2x80x94P(xe2x95x90O)(Oxe2x88x92)xe2x80x94Oxe2x80x94P(xe2x95x90O)(Oxe2x88x92)xe2x80x94Oxe2x80x94P(xe2x95x90O)(Oxe2x88x92)xe2x80x94O-nucleoside.
The triphosphate group has four formal negative charges which require counter-ions, i.e., positively charged ions. Any positively charged ion can be used, e.g., without limitation, Na+, K+, NH4+, Mg2+, Ca2+, etc. Na+ is one of the most commonly used counter-ions. It is accepted convention in the art to omit the counter-ion, which is understood to be present, when displaying nucleoside triphosphates and that convention will be followed in this application.
As used herein, unless expressly noted otherwise, the term xe2x80x9cnucleoside triphosphatexe2x80x9d or reference to any specific nucleoside triphosphate; e.g., adenosine triphosphate, guanosine triphosphate or cytidine triphosphate, refers to the triphosphate made using either a ribonucleoside or a 2xe2x80x2-deoxyribonucleoside.
A xe2x80x9cnucleotidexe2x80x9d refers to a nucleoside linked to a single phosphate group or, by convention, when referring to incorporation into a polynucleotide, a short-hand for the nucleoside triphosphate which is the species which actually polymerizes in the presence of a polymerase.
A xe2x80x9cnatural nucleotidexe2x80x9d refers to an A, C, G or U nucleotide when referring to RNA and to dA, dC, dG and dT (the xe2x80x9cdxe2x80x9d referring to the fact that the sugar is a deoxyribose) when referring to DNA. A natural nucleotide also refers to a nucleotide which may have a different structure from the above, but which is naturally incorporated into a polynucleotide sequence by the organism which is the source of the polynucleotide.
As used herein, inosine (I) refers to a purine ribonucleoside containing the base hypoxanthine.
As used herein, a xe2x80x9csubstitutexe2x80x9d for a nucleoside triphosphate refers to a molecule in a different nucleoside may be naturally substituted for A, C, G or T. Thus, inosine is a natural substitute for guanosine and uridine is a natural substitute for thymidine.
As used herein, a xe2x80x9cmodified nucleotidexe2x80x9d is characterized by two criteria. First, a modified nucleotide is a xe2x80x9cnon-naturalxe2x80x9d nucleotide. In one aspect, a xe2x80x9cnon-naturalxe2x80x9d nucleotide may be a natural nucleotide which is placed in non-natural surroundings. For example, in a polynucleotide, which is naturally composed of deoxyribonucleotides, a ribonucleotide would constitute a xe2x80x9cnon-naturalxe2x80x9d nucleotide when incorporated into that polynucleotide. Conversely, in a polynucleotide, which is naturally composed of ribonucleotides, a deoxyribonucleotide incorporated into that polynucleotide would constitute a non-natural nucleotide. In addition, a xe2x80x9cnon-naturalxe2x80x9d nucleotide may be a natural nucleotide which has been chemically altered, for example, without limitation, by the addition of one or more chemical substituent groups to the nucleotide molecule, the deletion of one or more chemical substituents groups from the molecule or the replacement of one or more atoms or chemical substituents in the nucleotide for other atoms or chemical substituents. Finally, a xe2x80x9cmodifiedxe2x80x9d nucleotide may be a molecule that resembles a natural nucleotide little, if at all, but is nevertheless capable of being incorporated by a polymerase into a polynucleotide in place of a natural nucleotide.
The second criterion by which a xe2x80x9cmodifiedxe2x80x9d nucleotide, as the term is used herein, is characterized is that it alter the cleavage properties of the polynucleotide into which it is incorporated. For example, without limitation, incorporation of a ribonucleotide into an polynucleotide composed predominantly of deoxyribonuclotides imparts a susceptibility to alkaline cleavage which does not exist in natural deoxyribonuclotides. This second criterion of a xe2x80x9cmodifiedxe2x80x9d nucleotide may be met by a single non-natural nucleotide substituted for a single natural nucleotide (e.g., the substitution of ribonucleotide for deoxyribonucleotide described above) or by a combination of two or more non-natural nucleotides which, when subjected to selected reaction conditions, do not individually alter the cleavage properties of a polynucleotide but, rather, interact with one another to impose altered cleavage properties on the polynucleotide (termed xe2x80x9cdinucleotide cleavagexe2x80x9d).
When reference is made herein to the incorporation of a single modified nucleotide into a polynucleotide and the subsequent cleavage of the modified polynucleotide, the modified nucleotide cannot be a ribonucleotide.
xe2x80x9cHaving different cleavage characteristicsxe2x80x9d when referring to a modified nucleotide means that modified nucleotides incorporated into the same modified polynucleotide can be cleaved under reaction conditions which leaves the sites of incorporation of each of the other modified nucleotides in that modified polynucleotide intact.
As used herein, a xe2x80x9cstabilizing modified nucleotidexe2x80x9d refers to a modified nucleotide that imparts increased resistance to cleavage that the site of incorporation of such a modified nucleotide. Most of the modified nucleotides described herein provide increased lability to cleavage when incorporated in a modified polynucleotide. However, the differential lability of modified nucleotides over natural nucleotides in a modified polynucleotide may not always be sufficient to allow complete cleavage at the modified nucleotide(s) while avoiding any cleavage at the natural nucleotides. Therefore there is a useful role for modified nucleotides that reduce lability (stabilizing nucleotides), in that the presence of stabilizing nucleotides in a polynucleotide which also contains nucleotides that increase lability to a particular cleavage procedure (labilizing nucleotides) can provide increased discrimination between cleaved and noncleaved nucleotides in a cleavage procedure. The preferred way to use stabilizing nucleotides in a polynucleotide is to substitute stabilizing nucleotides for all the nucleotides that are not labilizing nucleotides. In the case of mononucleotide cleavage this would entail use of three stabilizing nucleotides and one labilizing nucleotide; in the case of dinucleotide cleavage this would entail use of two stabilizing nucleotides and two (different) labilizing nucleotides.
As used herein the term xe2x80x9cstabilizing nucleotidexe2x80x9d refers to a modified nucleotide which, when incorporated in a polynucleotide and subjected to a cleavage procedure, reduces cleavage at the stabilizing nucleotides relative to mono or dinucleotide cleavage at other (nonstabilizing) nucleotides of the polynucleotide, whether said other nucleotides are natural nucleotides or labilizing nucleotides.
As used herein a xe2x80x9cdestabilizing modified nucleotidexe2x80x9d or a xe2x80x9clabilizing modified nucleotide refers to a modified nucleotide which imparts greater affinity for cleavage than a natural nucleotide at sites of incorporation of the destabilizing modified nucleotide in a polynucleotide.
As used herein xe2x80x9cdetermining a massxe2x80x9d refers to the use of a mass spectrometer to determine the mass of a molecule. Mass spectrometers generally measure the mass to charge ratio (m/z) of analyte ions, from which the mass can be inferred. When the charge state of the analyte polynucleotide is +1 or xe2x88x921 the m/z ratio and the mass are numerically the same after making a correction for the proton mass (an extra proton is added to positively charged ions and a proton is abstracted from negatively charged ions) but when the charge is  greater than +1 or  less than xe2x88x921 the m/z ratio will usually be less than the actual mass. In some cases the software provided with a mass spectrometer computes the mass from m/z so the user does not need to be aware of the difference.
As used herein, a xe2x80x9clabelxe2x80x9d or xe2x80x9ctagxe2x80x9d refers to a molecule that, when appended by, for example, without limitation, covalent bonding or hybridization, to another molecule, for example, also without limitation, a polynucleotide or polynucleotide fragment, provides or enhances a means of detecting the other molecule. A fluorescence or fluorescent label or tag emits detectable light at a particular wavelength when excited at a different wavelength. A radiolabel or radioactive tag emits radioactive particles detectable with an instrument such as, without limitation, a scintillation counter.
A xe2x80x9cmass-modifiedxe2x80x9d nucleotide is a nucleotide in which an atom or chemical substituents has been added, deleted or substituted but such addition, deletion or substitution does not create modified nucleotide properties, as defined herein, in the nucleotide; i.e., the only effect of the addition, deletion or substitution is to modify the mass of the nucleotide.
A xe2x80x9cpolynucleotidexe2x80x9d refers to a linear chain of nucleotides connected by a phosphodiester linkage between the 3xe2x80x2-hydroxyl group of one nucleoside and the 5xe2x80x2-hydroxyl group of a second nucleoside which in turn is linked through its 3xe2x80x2-hydroxyl group to the 5xe2x80x2-hydroxyl group of a third nucleoside and so on to form a polymer comprised of nucleosides linked by a phosphodiester backbone. The polynucleotide may be, without limitation, single or double stranded DNA or RNA or any other structure known in the art.
A xe2x80x9cmodified polynucleotidexe2x80x9d refers to a polynucleotide in which one or more natural nucleotides have been partially or substantially completely replaced with modified nucleotides.
A xe2x80x9cmodified DNA fragmentxe2x80x9d refers to a DNA fragment synthesized under Sanger dideoxy termination conditions with one of the natural nucleotides other than the one which is partially substituted with its dideoxy analog being replaced with a modified nucleotide as defined herein. The result is a set of Sanger fragments; i.e., a set of fragments ending in ddA, ddC, ddG or ddT, depending on the dideoxy nucleotide used with each such fragment also containing modified nucleotides (if, of course, the natural nucleotide corresponding to the modified nucleotide exists in that particular Sanger fragment).
As used herein, to xe2x80x9calter the cleavage propertiesxe2x80x9d of a polynucleotide means to render the polynucleotide differentially cleavable or non-cleavable; i.e., resistant to cleavage, at the point of incorporation of the modified nucleotide relative to sites consisting of other non-natural or natural nucleotides. It is presently preferred to xe2x80x9calter the cleavage propertiesxe2x80x9d by rendering the polynucleotide more susceptible to cleavage at the sites of incorporation of modified nucleotides than at any other sites in the molecule.
As used herein, the use of the singular when referring to nucleotide substitution is to be construed as including substitution at each point of occurrence of the natural nucleotide unless expressly noted to be otherwise.
As used herein, a xe2x80x9ctemplatexe2x80x9d refers to a target polynucleotide strand, for example, without limitation, an unmodified naturally-occurring DNA strand, which a polymerase uses as a means of recognizing which nucleotide it should next incorporate into a growing strand to polymerize the complement of the naturally-occurring strand. Such DNA strand may be single-stranded or it may be part of a double-stranded DNA template. In applications of the present invention requiring repeated cycles of polymerization, e.g., the polymerase chain reaction (PCR), the template strand itself may become modified by incorporation of modified nucleotides, yet still serve as a template for a polymerase to synthesize additional polynucleotides.
A xe2x80x9cprimerxe2x80x9d is a short oligonucleotide, the sequence of which is complementary to a segment of the template which is being replicated, and which the polymerase uses as the starting point for the replication process. By xe2x80x9ccomplementaryxe2x80x9d is meant that the nucleotide sequence of a primer is such that the primer can form a stable hydrogen bond complex with the template; i.e., the primer can hybridize to the template by virtue of the formation of base-pairs over a length of at least ten base pairs.
As used herein, a xe2x80x9cpolymerasexe2x80x9d refers, without limitation, to molecules such as DNA or RNA polymerases, reverse transcriptases, mutant DNA or RNA polymerases mutagenized by nucleotide addition, nucleotide deletion, one or more point mutations or the technique known to those skilled in the art as xe2x80x9cDNA shufflingxe2x80x9d (q.v., infra) or by joining portions of different polymerases to make chimeric polymerases. Combinations of these mutagenizing techniques may also be used. A polymerase catalyzes the polymerization of nucleotides to form polynucleotides. Methods are disclosed herein and are aspects of this invention, for producing, identifying and using polymerases capable of efficiently incorporating modified nucleotides along with natural nucleotides into a polynucleotide. Polymerases may be used either to extend a primer once or repetitively or to amplify a polynucleotide by repetitive priming of two complementary strands using two primers. Methods of amplification include, without limitation, polymerase chain reaction (PCR), NASBR, SDA, 3SR, TSA and rolling circle replication. It is understood that, in any method for producing a polynucleotide containing given modified nucleotides, one or several polymerases or amplification methods may be used.
A xe2x80x9cheat stable polymerasexe2x80x9d or xe2x80x9cthermostable polymerasexe2x80x9d refers to a polymerase which retains sufficient activity to effect primer extension reactions after being subjected to elevated temperatures, such as those necessary to denature double-stranded nucleic acids.
The selection of optimal polymerization conditions depends on the application. In general, a form of primer extension may be best suited to sequencing or variance detection methods that rely on dinucleotide cleavage and mass spectrometric analysis while either primer extension or amplification (e.g., PCR) will be suitable for sequencing methods that rely on electrophoretic analysis. Genotyping methods are best suited to production of polynucleotides by amplification. Either type of polymerization may be suitable for variance detection methods of this invention.
A xe2x80x9crestriction enzymexe2x80x9d refers to an endonuclease (an enzyme that cleaves phosphodiester bonds within a polynucleotide chain) that cleaves DNA in response to a recognition site on the DNA. The recognition site (restriction site) consists of a specific sequence of nucleotides typically about 4-8 nucleotides long.
As used herein, xe2x80x9celectrophoresisxe2x80x9d refers to that technique known in the art as gel electrophoresis; e.g., slab gel electrophoresis, capillary electrophoresis and automated versions of these, such as the use of an automated DNA sequencer or a simultaneous multi-channel automated capillary DNA sequencer or electrophoresis in an etched channel such as that which can be produced in glass or other materials.
xe2x80x9cMass spectrometryxe2x80x9d refers to a technique for mass analysis known in the art which includes, but is not limited to, matrix assisted laser desorbtion ionization (MALDI) and electrospray ionization (ESI) mass spectrometry optionally employing, without limitation, time-of-flight, quadrupole or Fourier transform detection techniques. While the use of mass spectrometry constitutes a preferred embodiment of this invention, it will be apparent that other instrumental techniques are, or may become, available for the determination of the mass or the comparison of masses of oligonucleotides. An aspect of the present invention is the determination and comparison of masses and any such instrumental procedure capable of such determination and comparison is deemed to be within the scope and spirit of this invention.
As used herein, xe2x80x9cFRETxe2x80x9d refers to fluorescence resonance energy transfer, a distance dependent interaction between the electronic excited states of two dye molecules in which excitation is transferred from one dye (the donor) to another dye (the acceptor) without emission of a photon. A series of fluorogenic procedures have been developed to exploit FRET. In the present invention, the two dye molecules are generally located on opposite sides of a cleavable modified nucleotide such that cleavage will alter the proximity of the dyes to one another and thereby change the fluorescense output of the dyes on the polynucleotide.
As used herein xe2x80x9cconstruct a gene sequencexe2x80x9d refers to the process of inferring partial or complete information about the DNA sequence of a subject polynucleotide by analysis of the masses of its fragments obtained by a cleavage procedure. The process of constructing a gene sequence generally entails comparison of a set of experimentally determined cleavage masses with the known or predicted masses of all possible polynucleotides that could be obtained from the subject polynucleotide given only the constraints of the modified nucleotide(s) incorporated in the polynucleotide and the chemical reaction mechanism(s) utilized, both of which impact the range of possible constituent masses. Various analytical deductions may then be employed to extract the greatest amount of sequence information from the masses of the cleavage fragments. More sequence information can generally be inferred when the subject polynucleotide is modified and cleaved, in separate reactions, by two or more modified nucleotides or sets of modified nucleotides because the range of deductions that may be made from analysis of several sets of cleavage fragments is greater.
As used herein, a xe2x80x9csequence ladderxe2x80x9d is a collection of overlapping polynucleotides, prepared from a single DNA or RNA template, which share a common end, usually the 5xe2x80x2 end, but which differ in length because they terminate at different sites at the opposite end. The sites of termination coincide with the sites of occurrence of one of the four nucleotides, A,G,C or T/U, in the template. Thus the lengths of the polynucleotides collectively specify the intervals at which one of the four nucleotides occurs in the template DNA fragment. A set of four such sequence ladders, one specific for each of the four nucleotides, specifies the intervals at which all four nucleotides occur, and therefore provides the complete sequence of the template DNA fragment. As used herein, the term xe2x80x9csequence ladderxe2x80x9d also refers to the set of four sequence ladders required to determine a complete DNA sequence. The process of obtaining the four sequence ladders to determine a complete DNA sequence is referred to as xe2x80x9cgenerating a sequence ladder.xe2x80x9d
As used herein, xe2x80x9ccell senscence selectionxe2x80x9d refers to a process by which cells that are susceptible to being killed by a particular chemical only when the cells are actively growing; e.g., without limitation, bacteria which can be killed by antibiotics only when they are growing, are used to find a polymerase which will incorporate a modified nucleotide into a polynucleotide. The procedure requires that, when a particular polymerase which has been introduced into the cell line incorporates a modified nucleotide, that incorporation produces changes in the cells which cause them to senesce, i.e., to stop growing. When cell colonies, some members of which contain the modified nucleotide-incorporating polymerase and some member of which don""t, are then exposed to the chemical, only those cells which do not contain the polymerase are killed. The cells are then placed in a medium where cell growth is reinitiated; i.e., a medium without the chemical or the modified nucleotide, and those cells, which grow are separated and the polymerase isolated from them.
As used herein, a xe2x80x9cchemical oxidantxe2x80x9d refers to a reagent capable of increasing the oxidation state of a group on a molecule. For instance, without limitation, a hydroxyl group (-OH) can be oxidized to a keto group. For example and without limitation, potassium permanganate, t-butyl hypochlorite, m-chloroperbenzoic acid, hydrogen peroxide, sodium hypochlorite, ozone, peracetic acid, potassium persulfate, and sodium hypobromite are chemical oxidants.
As used herein, a xe2x80x9cchemical basexe2x80x9d refers to a chemical which, in aqueous medium, has a pK greater than 7.0. Examples of chemical bases are, without limitation, alkali (sodium, potassium, lithium) and alkaline earth (calcium, magnesium, barium) hydroxides, sodium carbonate, sodium bicarbonate, trisodium phosphate, ammonium hydroxide and nitrogen-containing organic compounds such as pyridine, aniline, quinoline, morpholine, piperidine and pyrrole. These may be used as aqueous solutions which may be mild (usually due to dilution) or strong (concentrated solutions). A chemical base also refers to a strong non-aqueous organic base; examples of such bases include, without limitation, sodium methoxide, sodium ethoxide and potassium t-butoxide.
As used herein, the term xe2x80x9cacidxe2x80x9d refers to a substance which dissociates on solution in water to produce one or more hydrogen ions. The acid may be inorganic or organic. The acid may be strong which generally infers highly concentrated, or mild, which generally infers dilute. It is, of course, understood that acids inherently have different strengths; e.g., sulfuric acid is much stronger than acetic acid and this factor may also be taken into consideration when selecting the appropriate acid to use in conjunction with the methods described herein. The proper choice of acid will be apparent to those skilled in the art from the disclosures herein. Preferably, the acids used in the methods of this invention are mild. Examples of inorganic acids are, without limitation, hydrochloric acid, sulfuric acid, phosphoric acid, nitric acid and boric acid. Examples, without limitation, of organic acids are formic acid, acetic acid, benzoic acid, p-toluenesulfonic acid, trifluoracetic acid, naphthoic acid, uric acid and phenol.
An xe2x80x9celectron-withdrawing groupxe2x80x9d refers to a chemical group which, by virtue of its greater electronegativity inductively draws electron density away from nearby groups and toward itself, leaving the less electronegative group with a partial positive charge. This partial positive charge, in turn, can stabilize a negative charge on an adjacent group thus facilitating any reaction, which involves a negative charge, either formal or in a transition state, on the adjacent group. Examples of electron-withdrawing groups include, without limitation, cyano (Cxe2x89xa1N), azido (xe2x80x94Nxe2x89xa1N), nitro (NO2), halo (F, Cl, Br, I), hydroxy (xe2x80x94OH), thiohydroxy (xe2x80x94SH) and ammonium (xe2x80x94NH3+).
An xe2x80x9celectron withdrawing element,xe2x80x9d as used herein, refers to an atom which is more electronegative than carbon so that, when placed in a ring, the atom draws electrons to it which, as with an electron-withdrawing group, results in nearby atoms being left with a partial positive charge. This renders the nearby atoms susceptible to nucleophilic attack. It also tends to stabilize, and therefore favor the formation of, negative charges on other atoms attached to the positively charged atom.
An xe2x80x9celectrophilexe2x80x9d or xe2x80x9celectrophilic groupxe2x80x9d refers to a group which, when it reacts with a molecule, takes a pair of electrons from the molecule. Examples of some common electrophiles are, without limitation, iodine and aromatic nitrogen cations.
An xe2x80x9calkylxe2x80x9d group as used herein refers to a 1 to 20 carbon atom straight or branched, unsubstituted group. Preferably the group consists of a 1 to 10 carbon atom chain; most preferably, it is a 1 to 4 carbon atom chain. As used herein xe2x80x9c1 to 20,xe2x80x9d etc. carbon atoms means 1 or 2 or 3 or 4, etc. up to 20 carbon atoms in the chain.
A xe2x80x9cmercaptoxe2x80x9d group refers to an -SH group.
An xe2x80x9calkylating agentxe2x80x9d refers to a molecule which is capable of introducing an alkyl group into a molecule. Examples, without limitation, of alkyl groups include methyl iodide, dimethyl sulfate, diethyl sulfate, ethyl bromide and butyl iodide.
As used herein, the terms xe2x80x9cselective,xe2x80x9d xe2x80x9cselectively,xe2x80x9d xe2x80x9csubstantially,xe2x80x9d xe2x80x9cessentially,xe2x80x9d xe2x80x9cuniformlyxe2x80x9d and the like, mean that the indicated event occurs to a particular degree. In particular, the percent incorporation of a modified nucleotide is greater than 90%, preferably greater than 95%, most preferably, greater than 99% or the selectivity for cleavage at a modified nucleotide is greater than 10xc3x97, preferably greater than 25xc3x97, most preferably greater than 100xc3x97 that of other nucleotides natural or modified, or the percent cleavage at a modified nucleotide is greater than 90%, preferably greater than 95%, most preferably greater than 99%.
As use herein, xe2x80x9cdiagnosisxe2x80x9d refers to determining the nature of a disease or disorder. The methods of this invention may be used in any form of diagnosis including, without limitation, clinical diagnosis (a diagnosis made from a study of the signs and symptoms of a disease or disorder, where such sign or symptom is the presence of a variance), differential diagnosis (the determination of which of two or more diseases with similar symptoms is the one from which a patient is suffering), etc.
By xe2x80x9cprognosis,xe2x80x9d as used herein, is meant a forecast of the probable course and/or outcome of a disease. In the context of this invention, the methods described herein may be used to follow the effect of a genetic variance or variances on disease progression or treatment response. It is to be noted that, using the methods of this invention as a prognostic tool does not require knowledge of the biological impact of a variance. The detection of a variance in an individual afflicted with a particular disorder or the statistical association of the variance with the disorder is sufficient. The progression or response to treatment of patients with a particular variance can then be traced throughout the course of the disorder to guide therapy or other disorder management decisions.
By xe2x80x9chaving a genetic componentxe2x80x9d is meant that a particular disease, disorder or response to treatment is known or suspected to be related to a variance or variances in the genetic code of an individual afflicted with the disease or disorder.
As used herein, an xe2x80x9cindividualxe2x80x9d refers to any higher life form including reptiles and mammals, in particular to human beings. However, the methods of this invention are useful for the analysis of the nucleic acids of any biological organism
Table 1 is a description of several procedures presently in use for the detection of variance in DNA.
Table 2 shows the molecular weights of the four DNA nucleotide monophosphates and the mass difference between each pair of nucleotides.
Table 3 shows the masses of all possible 2 mers, 3 mers, 4 mers and 5 mers of the DNA nucleotides in Table 2.
Table 4 shows the masses of all possible 2 mers, 3 mers, 4 mers, 5 mers, 6 mers and 7 mers that would be produced by cleavage at one of the four nucleotides and the mass differences between neighboring oligonucleotides.
Table 5 shows the mass changes that will occur for all possible point mutations (replacement of one nucleotide by another) and the theoretical maximum size of a polynucleotide in which a point mutation should be detectable by mass spectrometry using mass spectrometers of varying resolving powers.
Table 6 shows the actual molecular weight differences observed in an oligonucleotide using the method of this invention; the difference reveals a hitherto unknown variance in the oligonucleotide.
Table 7 shows all of the masses obtained by cleavage of an exemplary 20 mer in four separate reactions, each reaction being specific for one of the DNA nucleotide; i.e., at A, C, G and T.