1. Field of the Invention
The field of this invention is nucleic acids and their analogs, more specifically nucleotide analogs that can form non-standard Watson-Crick nucleobase pairs that have similar geometry as standard Watson-Crick pairs, but are joined by a non-standard hydrogen bonding schemes. More specifically, this invention relates to processes that introduce these analogs into oligonucleotides via enzymatic processes that mismatch non-standard nucleotides against standard nucleotides to create orthogonally capturable tags. This invention further relates to processes and pairs of processes that replace non-standard nucleotides by more than one standard nucleotides, leading to clonable products and, in particular, to two clonable products whose sequences, when compared, allow the inference of the sites in the original oligonucleotide sequence where non-standard nucleotides were present.
2. Description of Related Art
Natural oligonucleotides bind to complementary oligonucleotides according to Watson and Crick rules of nucleobase pairing, where adenine (A) (or 2-aminoadenine) pairs with thymine (T) (or uracil, U), and guanine (G) pairs with cytosine (C), with complementary strands anti-parallel to one another. In this disclosure, “DNA” or “nucleic acid” is understood to include, as appropriate, both DNA (where the sugar is 2′-deoxyribose) and RNA (where the sugar is ribose), the 2′-O-alkyl and allyl derivatives, and these nucleic acids and their analogs in non-linear topologies, including dendrimers, comb-structures, and nanostructures, and these nucleic acids and their analogs carrying tags (e.g., fluorescent, functionalized, or binding) to the ends, sugars, or nucleobases, and/or non-nucleotidic material attached to the ends of the strand.
These pairing rules, which are largely context free and which can be applied without undue experimentation even by high school students, allow specific hybridization of an oligonucleotide to a complementary oligonucleotide, making oligonucleotides valuable as probes in the laboratory, in diagnostics, as messages that can direct the synthesis of specific proteins, and in other applications well known in the art. Such base pairing is used, as an example and without limitation, to capture other oligonucleotides to beads, arrays, and other solid supports, in linear and dendrimeric structures, to allow nucleic acids to fold in hairpins, beacons, and catalysts, as supports for functionality, such as fluorescence, fluorescence quenching, binding/capture tags, and catalytic functionality, as part of more complex architectures, including dendrimers and nanostructures, and as scaffolds to guide chemical reactions.
Further, nucleobase pairing is used by enzymes to catalyze the synthesis of new oligonucleotides that are complementary to template nucleotides. In this synthesis, building blocks (normally the triphosphates of ribo- or deoxyribonucleosides carrying of A, T, U, C, or G) are directed by a template oligonucleotide to form a complementary oligonucleotide with the complementary sequence. This serves as the basis for technologies for enzymatic synthesis and amplification of specific nucleic acids by enzymes such as DNA and RNA polymerase, in the polymerase chain reaction (PCR), and in a variety of architectures that may involve synthesis, ligation, cleavage, immobilization and release, inter alia, used in technology to detect nucleic acids.
The Watson-Crick pairing rules can be understood chemically as a consequence of the arrangement of hydrogen bonding groups on the heterocyclic nucleobases of the oligonucleotide, groups that can either be hydrogen bond donors or acceptors. In the standard Watson-Crick geometry, a large purine nucleobase pairs with a small pyrimidine nucleobase. Thus, the AT nucleobase pair is the same size as a GC nucleobase pair; the rungs of the DNA ladder, formed from either AT or GC nucleobase pairs, all have the same length. In this disclosure, to be “complementary in the Watson-Crick sense” means to have the Watson-Crick geometry, a full pairing (not wobble pairing) of a large purine and a small pyrimidine held together by three hydrogen bonds, or (if context demands) two hydrogen bonds, where in pairing is said to be “against” the nucleotide in the complementary strand, in an antiparallel orientation, to which it is matched.
The specificity of recognition between large and small nucleobases is determined by hydrogen bonding between the nucleobases. In standard nucleobases, hydrogen bond donors are heteroatoms (nitrogen or oxygen in the natural nucleobases) bearing a hydrogen, while hydrogen bond acceptors are heteroatoms (nitrogen or oxygen in the natural nucleobases) with a lone pair of electrons. In the Watson-Crick nucleobase pairing geometry, a six membered ring (in standard nucleobases, a pyrimidine) pairs with a ring system composed of a fused five-six ring system (in standard nucleobases, a purine), with a middle hydrogen bond linking two ring atoms, and hydrogen bonds on either side joining functional groups appended to each of the rings, with donor groups paired with acceptor groups. The AT nucleobase pair uses this hydrogen bonding pattern only partly; it is completely used in the diaminoA:T base pair.
In 1990, the instant Inventor filed the first patent application (which later issued as U.S. Pat. No. 5,432,272) disclosing compositions of matter that expanded the number of nucleobases that could pair by such simple rules. He proposed eight additional nucleobases that form four additional pairs by changing the pattern of hydrogen bond donor and acceptor groups presented by a nucleobase to the nucleobase on a complementary oligonucleotide analog [U.S. Pat. Nos. 5,432,272, 5,965,364, 6,001,983, 6,037,120, 6,140,496, 6,627,456, 6,617,106]. These disclosures showed that the geometry of the Watson-Crick nucleobase pair could accommodate as many as 12 nucleobases forming 6 mutually exclusive pairs. Of these, four nucleobases forming two pairs are “standard”, while eight nucleobases forming four pairs were termed “non-standard”. Adding the non-standard nucleobases to the standard nucleobases yielded an Artificially Expanded Genetic Information System (AEGIS). It was also noted that these nucleobases analogs might be functionalized to enable a single biopolymer capable of both genetics and catalysis.
Expanded genetic alphabets have now been explored in many laboratories, and the possibility of a fully artificial genetic system has been advanced [Swi89][Pic90] [Pic91] [Voe93] [von95][Voe96][Voe96a] [Voe96b] [Kod97][Jur98][Lut99][Jur99][Jur00], the contents of which are incorporated by reference.
To systematize the nomenclature for the hydrogen bonding patterns, the hydrogen bonding pattern implemented on a small component of a nucleobase pair are designated by the prefix “py”. Following this prefix is the order, from the major groove to the minor groove, of hydrogen bond acceptor (A) and donor (D) groups. Thus, both thymine and uracil implement the standard hydrogen bonding pattern pyADA. The standard nucleobase cytosine implements the standard hydrogen bonding pattern pyDAA. Hydrogen bonding patterns implemented on the large component of the nucleobase pair are designated by the prefix “pu”. Again following the prefix, the hydrogen bond donor and acceptor groups are designated, from the major to the minor grooves, using “A” and “D”. Thus, the standard nucleobases adenine and guanine implement the standard hydrogen bonding patterns puDA- and puADD respectively.
A teaching of this disclosure is that hydrogen-bonding patterns designated using this systematic nomenclature are distinct in concept from the organic molecules that are used to implement the hydrogen-bonding patterns. Thus, guanosine is a nucleoside that implements the puADD hydrogen-bonding pattern. So does, however, 7-deazaguanosine, 3-deazaguanosine, 3,7-dideazaguanosine, and any of any number of other purines and purine derivatives, including those that carry side chains to which are appended functional groups, such as fluorescent, fluorescent quencher, attachment, or metal complexing groups. Which organic molecule is chosen to implement a specific hydrogen-bonding pattern determines, in large part, the utility of the non-standard hydrogen-bonding pattern, in various applications to which it might be applied.
The additional nucleobase pairs, because of their desirable pairing properties, chemical stability, and other features known to those skilled in they art, have been useful for a variety of purposes. For example, the nucleobase pair between 2-amino-5-methyl-1-(1′-beta-D-2′-deoxyribofuranosyl)-4(1H)-pyrimidine, also known as 2′-deoxyisocytidine, disoC, or sometimes (less correctly) isoC and implementing the pyAAD hydrogen bonding pattern, and 6-amino-1,9-dihydro-9-(1′-beta-D-2′-deoxyribofuranosyl)-3H-purin-2-one, also known as 2′-deoxyisoguanosine, disoG, or sometimes (less correctly) isoG, and implementing the puDDA hydrogen bonding pattern, is incorporated into the branched DNA diagnostics tools marketed today by Bayer and its successor, Siemens. Here, the non-standard nucleobase pair supports orthogonal molecular recognition in aqueous solution, similar to nucleic acids but with a coding system that is orthogonal to the system in DNA and RNA, Thus, it allows the assembly of the branched dendrimer in the assay free from inhibition by adventitious nucleic acid, and prevents adventitious nucleic acid from capturing signaling elements form the nanostructure in the absence of the target analyte nucleic acid, creating noise. Further, adding extra letters to the genetic alphabet speeds hybridization, presumably because it decreases the number of close mismatches where DNA dwells before finding its fully matched partner. The branched DNA assay has FDA-approval and is widely used to provide personalized patient care in the clinic.
One of the advantages of incorporating non-standard nucleotides into human diagnostic assays is that binding between oligonucleotides containing these can occur without interference from natural DNA, which is often present in abundance in samples taken from human tissues. Such binding is often used to concentrate samples from complex mixtures, on arrays or at the bottoms of plastic wells. natural DNA, built from A, T, G, and C, will interfere with A:T and G:C interactions. This leads to large amounts of noise in DNA arrays, for example. Accordingly, in the branched DNA assays, non-standard nucleotides are incorporated by chemical synthesis into the portion of tags that are used to move the analyte to a spot where it can be detected and to assemble signaling dendrimers.
Pairing between non-standard nucleotides cannot be used to directly bind natural analytes, as these analytes are themselves built from A, T, G, and C. Accordingly, when non-standard nucleotides are used to achieve orthogonality in clinical diagnostic assays [Elb04a][Elb04b], they are general appended as tags to primary probes, which are built from A, T. G, and C. The primary probes are the ones that contact the analyte targeted by the diagnostic assay. This limits considerably the use of non-standard components to achieve orthogonality and high signal-to-noise ratios in biological systems. A process that creates replicates or complements of oligonucleotides that replace in a controlled fashion standard nucleotides by non-standard nucleotides would therefore have utility. If this is sequence specific, the pairing of the resulting replicate or complement through non-standard base pairs could, in an appropriate architecture, offer an element of selectivity for the analyte in addition to those selectivity elements based on other regions of the analyte (for example, the regions that bind PCR amplification primers).
Conversely, oligonucleotides containing non-standard nucleotides cannot today be introduced into standard cloning systems. No strain used for cloning, including E. coli strains, is known to have the cellular machinery for making the triphosphates of non-standard nucleosides and using them to replicate DNA containing non-standard nucleotides. A process that creates replicates or complements of oligonucleotides that replace in a controlled fashion non-standard nucleotides by standard nucleotides (a vice versa process) would therefore have utility. Further, such a process would most useful if it is a process pair, where the product from one process replaces the non-standard nucleotide by one standard nucleotide, and another replaces the non-standard nucleotide by a different standard nucleotide. This makes it possible to compare the sequences of the two resulting replicates or complements to ascertain where in the oligonucleotide sequence the original non-standard nucleotide(s) was (were) found.
Mismatching is known between non-standard and standard pairs such that a standard nucleotide is incorporated opposite a nonstandard nucleotide in the template. For example, Sepiol et al. [Sep76] recognized that isoG, which presents a hydrogen bond donor-donor-acceptor pattern complementary to the acceptor-acceptor-donor pattern of isoC, exists in water to about 10% as an enol tautomeric form, which can present a hydrogen bond donor-acceptor-donor hydrogen bonding pattern complementary to T (acceptor-donor-acceptor). Work in the 1990's showed that polymerases of various types would incorporate T (or U) opposite isoG in a template, presumably by pairing between T (or U) and the minor tautomeric form of isoG [Swi93]. This caused the loss of the isoG:isoC pair in (for example) PCR reactions [Joh04], a loss that was considered throughout the art to be disadvantageous, as it appeared to deprive the product from the possibility of the PCR product of having the orthogonal isoC:isoG pair.
Struggling to suppress this mispairing between T and the minor tautomeric form of isoG, the instant Inventor and Michael Sismour exploited the discovery that the minor tautomer of isoG does not pair well with 2-thio, and replaced T with 2-thioT in a polymerase incubation [Sis05]. Therefore, products derived from a six letter PCR incorporating A, G, C, 2-thioT, isoG and isoC was able to retain the isoC and isoG non-standard components after many more cycles than a six letter PCR where standard T was used instead of 2-thioT. Thus, the products were able to retain the ability to be orthogonally bound by isoG:isoC pairing after many more cycles of PCR. Further attempting to avoid mispairing and isoG:T (or U) mismatching, 7-deazaisoG was developed [Mar04].
These examples from the prior art show the extent to which those in the art view as undesirable the mismatching between standard nucleotides and non-standard nucleotides, and thereby teach away from the instant invention, which is based on an inventive step that recognizes the utility of mismatching.