A universal pyrimidine-like nucleobase is defined as a nucleoside analog that forms a nucleobase pair with each of the two standard purine nucleobases (adenine and guanine) with equal (or nearly equal) facility, either as measured by the stability of the duplex, or by the preference for incorporation by DNA polymerases. For the purpose of practical application, “nearly equal” means that the difference is less than the difference normally seen with context dependence. Universal nucleobases have the potential for widespread application in research environments, where they could be very important in the design of universal primers and non-specific probes. Further, two large markets are emerging that would be enabled if pyrimidine-like universal nucleobases became available that met certain specifications with respect to promiscuous binding affinity to adenine and guanine.
The literature contains a large number of reviews of the “universal nucleobase” problem, various attempts in the past to solve it, and the utility of compositions that might solve it. This literature is incorporated herein by reference [Ber95] [Ber96] [Koo98] [Loa95a] [Loa95b] [Loa01] [Mar85] [Nic94] [Oht85].
The first of these commercial applications involves high throughput and highly parallel sequencing by synthesis, where the sequencing architecture involves ligation as the synthetic method. This approach is being developed in the laboratory of George Church at Harvard University, and at Agencourt Personal Genomics.
Another of these large-scale commercial applications involves the generation of simulants for the DNA of biohazard. The simulants would be distributed as part of a biohazard test assay kit for military and civilian preparedness. Here, one or more universal nucleobase analogs would prevent the simulant from itself being able to serve as this source of the biohazard DNA.
A universal nucleobase is defined as a nucleotide analogue that will pair with each of the four standard nucleobases with equal facility. Operationally, pairing facility is defined in two contexts. First, the ability of a universal nucleobase to pair may be defined in terms of the stability, often measured by melting temperature, of a duplex that contains one or more universal:standard pairs compared it to the stability of a reference duplex containing only standard:standard Watson-Crick nucleobase pairs.
Alternatively, the quality of a universal nucleobase may be defined by its ability to direct the incorporation of each of the four standard nucleotides with equal frequency in a reaction catalyzed by a DNA polymerase or reverse transcriptase. Conversely, a universal nucleobase will be added to a primer by DNA polymerases opposite standard nucleobases in a template with equal facility.
In the real world, perfect universal nucleobases (those that are perfectly promiscuous among the four standard nucleotides) are not known. For example, hypoxanthine has been used for many years as an approximation of a universal nucleobases. This has led to its application in degenerate primers, in probes for hybridization and in other contexts. Despite this use, it is clear that the compound is not indiscriminant in its nucleobase pairing properties [Oht85]. A wide range of melting temperatures is observed when hypoxanthine is placed opposite each of the four standard nucleobases [Oht85]. Further, primers containing multiple substitutions by inosine often give rise to sequence data that are difficult to analyze [Mar85].
A variety of other nucleobases have and proposed to mimic purines and/or pyrimidines with a greater sophistication. For example, azole carboxamides were proposed that could mimic both guanine and adenine in their hydrogen bonding patterns simply by rotating around the amide bond [Ber96]. As reviewed by Loakes [Lok01], for reasons that are not entirely clear, these compounds have been disappointing as nucleobases analogs, as one of the two conformational isomers appears to be preferred.
A third class of the universal nucleobases ideas has been based on the use of species that do not attempt to mimic alternative hydrogen bonding patterns displayed by the four standard nucleobases. Rather, these proposed nucleobase served simply to complete the hydrophobic stacking of the duplex. Bergstrom, for example, described 3-nitropyrrole as a candidate universal nucleobase [Ber95]. This analog was designed to maximize nucleobases stacking interactions. The nitro group was presumed to enhance stacking by polarizing the aromatic system. The same concept has been used to propose nitroindole as a universal nucleobase analog. 3-Nitropyrrole and 5-nitroindole are both sold by Glen Research as their protected DNA phosphoramidites.
3-Nitropyrrole does indeed pair with the four standard nucleobases. There is a range of melting temperatures observed, however, with nearly all pairs involving 3-nitropyrrole showing decreased stability [Ber95]. For example, in 15mers, the melting temperature drops by 11 to 14° C. (compared to 57° C. in the reference DNA duplex). The destabilization is still larger if multiple substitutions are made. These observations raise questions about how well the nitropyrrole stacks. Nitroindole appears to work better in this role. However, there is no question that DNA polymerases, including the error checking mechanisms that they contain, treat this species as foreign.
Much of the use of hydrophobic stacking nations lacking hydrogen bonds has been based on a general view that hydrogen bonding is not important to duplex recognition in DNA. It is difficult to understand why this view gained such currency in the modern biochemical community. Undoubtedly, the work by Eric Kool with fluorinated heterocycles [Koo98] that are accepted as substrates by some DNA polymerases has contributed to this view. It is clear, however, from the detailed studies by Geyer and Battersby in the Benner group [Gey03], studies that examined a very large number of nucleobase analogs, that hydrogen bonding and size complementarity contribute roughly equally to the ability of a nucleobase pair to stabilize a duplex. Non-complementary pyrimidine-pyrimidine pairs that can form three hydrogen bonds contribute to approximately the same to duplex stability as the size complementary nucleobase pairs that form only two interstrand hydrogen bonds (in natural DNA, the A:T base pair).
Further, it is clear that when polymerases are called upon to exercise their full discriminatory power against unnatural nucleobases analogs, they easily reject species that have a slight wobbles, lack unshared pairs of electrons in the minor groove, or have other structural features that are far more subtle than those introduced by the natural nucleobases [Ben04]. Is has required over a decade in the Benner laboratory before combinations of polymerases and nucleobases analogs were obtained that enabled full of polymerase chain reaction amplification of DNA containing nucleobases that were not natural [Sis04] [Ben04].
Given the different sizes of purines and pyrimidines, many groups, instead of trying to develop a single nucleobases that binds equally to both purine and pyrimidine complements seek to generate purine-like universal nucleobases and pyrimidine-like universal nucleobases as separate entities. The first are designed to bind with equal affinities to T and C, while the second are designed to bind with equal affinities to A and G.
For example, Glen Research makes commercially available two compounds for this purpose. These are called K and P (FIG. 4). These attempt to capture some of the hydrogen bonding capabilities of natural purines and pyrimidines. For them to be effective universal purine-like and pyrimidine-like universal nucleobases, however the tautomeric equilibria displayed by the K and P compounds must have the equilibrium constants close to unity. In fact, these equilibrium constants are not close to unity, at least not to the degree that is necessary for these compounds to meet the specifications required for sequencing by ligation and simulants.
Recent efforts in the development of high throughput sequencing methods have illustrated the need for universal nucleobases that meet higher standards. Both the Church laboratory at the Harvard Medical School and Agencourt are attempting to develop highly parallel sequencing by synthesis strategies that use ligation as the key step. In this strategy have, rather than add a single nucleotide to a growing template in the 5′ to 3′ direction, they ligate a compound from a library of short segment to the 5′ end of the DNA. One typical sequencing-by-ligation architecture ligates a 10mer. Here, the first nucleobase is A, T, G, or C, which queries the site in the sequence that is to be determined. Following the query nucleotide is a segment of nine nucleotides that forms a paired duplex with the target sequence. Some of the sites in this segment containing all four nucleotides, creating large libraries. At other sites, attempts are being made to introduce universal nucleobase so as to diminish the size of the library.
Special attention is paid to the sequence of five nucleotides immediately following the query nucleotide. Typical DNA ligases require five paired nucleotides in the duplex that occupies the binding site of the enzyme.
The simplest approach to create this outcome would be simply to make a library where each of the four standard nucleobases (ATGC) is present at each site to the extent of 25%. To do so for the five paired nucleotides in the binding site would require a library of 1024 different sequences to follow the query nucleotide. This library is already large, and leads to slow hybridization and significant mismatch.
It would be desirable to have improved universal nucleobases that would be placed in this region. For each pyrimidine-like universal nucleobase added, the degeneracy of the library could be decreased by factor of two.
Various groups had hoped to use inosine as a universal nucleobase been the region of the five nuclear base pairs. This would obviate the need to synthesize a multiple libraries containing large numbers of compounds. This universal base has not proved to be adequate however. The universal nucleobases to be developed in this research under the phase one grant that is sought in this application should be able to serve in need roles sufficient to meet the specifications sought in sequencing-by-ligation strategies. The Church laboratory, Agencourt, and other organizations attempting to implement sequencing-by-ligation would be potential customers for Firebird Biomolecular.
Another new application for universal nucleobases could be opened given a high quality universal nucleobase. This application concerns simulants to be used in kits that test for the presence of biohazards by seeking the nucleic acid of the biohazard in the environment being assayed. Field-ready kits to detect biological hazards need to include a positive standard.
Substance that response to the assay in the kit just as the natural biohazard does, and therefore allows the user in the field to be certain that the kit was functioning correctly.
The DNA from the biohazard itself, of course, could serve as the authentic standard for a kit. Thus, it is possible to include smallpox DNA as a positive standard in a kit designed to detect smallpox. If the DNA is only a small fragment of the total smallpox gene, it is conceivable that the presence of actual smallpox DNA might present little actual risk, and could actually be distributed as part of the kit. It would, of course, present a perceived risk.
Further, if the authentic standard is a piece of DNA that represents a larger fraction of the smallpox genome, the risk associated with the dispersion of large amounts of actual pathogen DNA will become real and unacceptable.
These considerations lead to the demand for simulants. A useful simulant is a DNA molecule that will blind to any probe that is presented to detect the biohazard nucleic acid. For essays that involve the template directed polymerization of species from the biohazard, the simulant must also serve as a template for a DNA polymerase. At the same time, it is desirable to have structures within the simulant that make it impossible to generate the native biohazard DNA by a template directed polymerization of the simulant itself.
The introduction of a pyrimidine-like universal nucleobase at points within the simulant offers one strategy to meet these goals. Especially if the simulant contains a C-glycosides (as in Compound I, FIG. 1), it will not be copied by mammalian DNA polymerases, something that has been shown by the Benner Research Group a decade ago [Hor95]. Further, the simulant containing the universal basis at strategic sites will be able to bind to any probe that is delivered, with the loss of specificity against probes that are not present being the only consequence. Further, given n universal nucleobases the templates, an attempt to transcribe would generate a defective sequence, where the likelihood of the correct biohazard sequence declining as a function of 2′. While this does not absolutely preclude the generation of biohazard DNA from the simulant in the test kit, it greatly diminishes the likelihood that it can be.