Over the past 15 years scientists have sought innovative molecular recognition systems that have binding properties that are useful in different ways. The structures of these systems have been modeled along the lines of DNA and RNA. Further, as with DNA and RNA, the molecular recognition systems have been useful because they bind to other components of the molecular recognition systems and/or to natural DNA and RNA following rules that can be expressed in a form that guides practitioners of ordinary skill in the art and enables them to do useful things.
DNA serves as an archetype to illustrate both molecular structure and rule base recognition. With DNA, three rules (A pairs with T, G pairs with C, the strands are antiparallel) permit the design of two DNA molecules that bind to each other in aqueous solution. When the rules are perfectly followed, two perfectly complementary DNA strands of a substantial length (15-20 nucleotides is normally sufficient in physiological buffers at 37° C.) will bind to each other with substantial selectivity even in complex mixtures containing many other DNA molecules.
Heuristic rules have been developed over the years to permit the prediction of general trends in DNA:DNA binding affinity. These have come by performing substantial numbers of melting temperature experiments. For examples as heuristic rules, longer DNA strands generally bind to their partners with higher melting temperatures (Tms) than shorter strands. G:C pairs generally contribute more to duplex stability than A:T pairs. More highly parameterized models improve on the estimates of melting temperatures [All198a][All198b][Mar85][Mat98]. While it remains true that the precise stability of duplexes may not be predictable, that imprecision does not defeat the utility of DNA:DNA binding or require undue experimentation to exploit, even though the number of different DNA sequences of length n (=4n) that would fall within a patent for the DNA molecular recognition system would be enormous.
It has been argued that this rule-based behavior arises because of the repeating charge in the backbone of nucleic acids [Ben04]. Certainly, analogs that have that repeating charges in their backbone maintain their rule-based pairing behavior even if they become quite long. In contrast, the few examples of useful nucleic acid analogs that lack a repeating charge in their backbone do not maintain their rule-based binding behavior in polymers built from two dozen or more monomer units (fewer if the nucleobases are predominately guanine). The archetypal example of such an uncharged DNA analog is the peptide nucleic acids (PNAs) [Egh92], where rule-based molecular recognition does not survive in longer molecules.
Orthogonal Binding Systems (FIG. 1)
An archetype of a human-invented rule-based molecular recognition is the artificially expanded genetic information system (AEGIS) disclosed in U.S. Pat. No. 5,432,272. The design of this artificial molecular recognition system began with the observation that two principles of complementarity govern the Watson-Crick pairing of nucleic acids: size complementarity (large purines pair with small pyrimidines) and hydrogen bonding complementarity (hydrogen bond donors from one nucleobase pair with hydrogen bond acceptors from the other). These two principles give rise to the simple rules for base pairing (“A pairs with T, G pairs with C”) that underlie genetics, molecular biology, and biotechnology.
U.S. Pat. No. 5,432,272 pointed out that these principles can be met by nucleotides other than adenine (A) and thymine (T), and guanine (G) and cytosine (C). Rather, twelve nucleobases forming six base pairs joined by mutually exclusive hydrogen bonding patterns might be possible within the geometry of the Watson-Crick base pair. FIG. 1 shows some of the standard and non-standard nucleobase pairs, together with the nomenclature to designate them. Those nucleobase analogs presenting non-standard hydrogen bonding patterns are part of an Artificially Expanded Genetic Information System, or AEGIS.
U.S. Pat. No. 5,432,272 and subsequent patents all taught that the hydrogen bonding pattern that makes an AEGIS component useful as a unit of molecular recognition is distinguishable from the heterocycle that implements it. This means that different heterocycles can often serve interchangeably as molecular recognition elements. This, in turn, permits the elements of an artificial molecular recognition system to be chosen based on considerations other than simple recognition. Thus, the pyADA hydrogen bonding pattern in AEGIS is implemented by thymidine, uridine, uridine derivatives carrying a 5-position linker attached to a fluorescent moiety, uridine derivatives carrying a 5-position linker attached to a biotin, and pseudouridine, for example.
Four features of the AEGIS system make it suited for application:    (a) AEGIS supports rule-based design. Anyone of ordinary skill in the art can design two AEGIS-containing molecules that bind to each other, after learning only a few additional rules, just as they can design binding partners with standard DNA. Again, a critical mass of melting temperatures were collected to support heuristic rules that allow prediction of affinity. As with DNA, the precise Tns are not predictable even with these heuristic rules, but this imprecision does not defeat the utility of the system, or create a need for undue experimentation to design AEGIS pairing partners.    (b) This rule-based molecular recognition displayed by AEGIS is orthogonal to that displayed by standard DNA. If two strands incorporating standard DNA bases are mixed with two other strands incorporating AEGIS components, the first pair will bind to each other only, and the second pair will bind to each other only, without formation of hybrids between the strands containing canonical and non-canonical bases. This allows two molecular recognition processes to occur independently in the same vessel.    (c) Sequences built from AEGIS components have higher information density (more different sequences per unit length), especially when they incorporate the full 12 letters that the AEGIS technology allows. This allows fewer near-mismatches in complicated systems to slow hybridization, for example. Thus, AEGIS tags hybridize more quickly [Col97].    (d) Enzymes can be found that allow AEGIS systems to be manipulated in ways common in biotechnology with standard DNA. These enzymes include polymerases that do primer extension, copy templates that contain AEGIS components, and amplify AEGIS oligonucleotides a polymerase chain reaction (PCR). Here, undue experimentation is often required to obtain enzymes that do this effectively, as many natural enzymes regard non-standard nucleotides as “foreign”, and do not accept them or, if they do, do not accept them with useful affinity.
An archetypal application of AEGIS is in the branched DNA (bDNA) assay used to measure levels of HIV, hepatitis B, and hepatitis C viruses in human patients [Elb04a][Elb04b]. As this example shows, even though the behavior of DNA duplexes built from AEGIS components having different sequences are not identical and may not be precisely predictable, this has not prevented the AEGIS molecular recognition system from improving the health care of some 400,000 patients annually [Ben04]. This is an illustration of the utility of orthogonality in the analytical chemistry of nucleic acids.
The SNAP2 System (FIG. 2, FIG. 3)
The SNAP2 system, disclosed in U.S. Ser. Nos. 60/627,460, 60/62745, 11/271,366 and 11/647,609, which are incorporated herein by reference, is designed to achieve yet a different molecular recognition specification: To obtain oligonucleotide molecules that bind to DNA and RNA with the specificity of a 16 mer, but the discriminatory power of 8 mers. These specifications are needed to solve certain problems intrinsic in the selective probing of large transcriptomes or genomes. For example, as the human genome has ca. 3×109 nucleotide sequences, a probe that is 16 nucleotides long will bind, on average, to just one sequence within that genome (pace due to repeats, the variance on that average is much larger than would be expected if the genome sequence were unbiased). Such calculations suggest that a probe must be 16 nucleotides in length to seek a specific NA segment in a human genome. Unfortunately, for duplexes of this length under standard hybridization conditions, single mismatches depress the melting temperatures only slightly. Further, the AT and GC nucleobase pairs have different intrinsic affinities, and contribute to duplex stability differently depending on the local “sequence context”. Together, this means that a duplex built from a pair of two 16-mers having two, three, or occasionally more mismatches can easily be more stable than another duplex built from a pair of two perfectly matched 16-mers. This creates difficulties throughout the analytical chemistry of nucleic acids, especially when attempting to multiplex.
Of course, if the duplex is shorter, then any pair of perfectly matched sequences will form a more stable duplex than any pair of duplexes that fail in complementarity by a single mismatch. For NA-NA duplexes under standard hybridization conditions, this is met by duplexes as short as 6 nucleobase pairs. These, however, lack specificity in the human genome (a 6-mer is found on average a million times in the human genome).
In the SNAP2 architecture, a primer is assembled with the assistance of the template on which it will prime. The primers are short enough so that they display strong discrimination against single nucleotide mismatches. In the SNAP2 patent application, these fragments are typically 6-8 nucleotides long. The 3′-fragment is chosen so that it does not prime oligonucleotide synthesis on a nucleic acid template by itself. The 3′-fragment does, however, prime oligonucleotide synthesis if it is assisted by a 5′-fragment. As the complete template complementary to both fragments must be present for priming to occur, the priming has the selectivity of (for example) a 14 mer (if the fragments are 8+6 nucleotides in length), but the discriminatory power against single nucleotide mismatches characteristic of 8 mers and 6 mers, respectively.
Self Avoiding Genetic Systems
The SNAP2 system disclosed in U.S. Ser. Nos. 60/627,460, 60/62745, 11/271,366 and 11/647,609 created a need for yet a different binding behavior, which we have called “self-avoiding”. The self-avoiding property can be understood by comparison with the molecular recognition behavior of the AEGIS system. The AEGIS system provides AEGIS components bind to other AEGIS components via simple rules, but that do not bind to natural DNA or RNA (orthogonality). A self-avoiding molecular recognition system (SAMRS) does exactly the opposite. The components of a SAMRS do bind to natural DNA or RNA, but not to other components of the same unnatural system.
In its general description, a SAMRS incorporates nucleobase analogs that replace T, A, G, and C by analogs that are indicated as T*, A*, G*, and C*, which are collectively called “* analogs” of T, A, G, and C respectively. In the simplest implementation of this concept, these * analogs are each able to form two hydrogen bonds to the complementary A, T, C, and G. This means that the T*:A, A*:T, C:*G, and G*:C nucleobase pairs contribute to duplex stability to approximately the same extent as an A:T pair. A SAMRS obtains its self-avoiding properties because the hydrogen bonding groups of the * analogs are chosen the T*:A* and C*:G* nucleobase pairs do not contribute as much to duplex stability because (in the simplest implementation) they are joined by only one hydrogen bond.
As with standard DNA, standard RNA, and AEGIS molecular recognition systems, within a SAMRS system, predicting the binding properties of any sequence will be subject to the same imprecision as predicting the properties of an arbitrary DNA or AEGIS molecule. Thus, as a general rule, if individuals of ordinary skill in the art wish to design a SAMRS sequence that binds to a preselected standard DNA molecule with a Tm of 25° C., they would write down the preselected sequence in the 5′-to-3′ direction, and then write below the SAMRS sequence in an antiparallel direction, matching a T* against every A in the preselected sequence, an A* against every T in the preselected sequence, a C* against every G in the preselected sequence, and a G* against every C in the preselected sequence. It is an open question as to whether such simple instructions allow one of ordinary skill in the art to obtain useful outcomes without undue experimentation. As elaborated below, attempts to obtain such utility failed when we took instruction from the prior art. One object of the instant invention is to provide SAMRS components that provide utility based on precisely this simple a set of rules and instructions.
As disclosed in U.S. Ser. Nos. 60/627,460, 60/62745, 11/271,366 and 11/647,609, the need for self-avoiding behaviors was pressing when one sought to have mixtures containing more than two oligonucleotides, and was especially pressing when making libraries of oligonucleotides (defined as having 10 or more oligonucleotide components), especially when those oligonucleotides were to interact with enzymes such as DNA polymerases. This problem is exemplified by multiplexed PCR, where the amplification is sought of many segments of DNA in one pot. This is attempted by adding in large excess two primers flanking each segment, contacting mixture with nucleoside triphosphates, and cycling the mixture up and down in temperature in the presence of a thermostable DNA polymerase. At low temperatures, the primers anneal to the template. At higher temperatures, the polymerase extends the primer to make a product copy of the template. At the highest temperature, the product copy falls off the template, allowing more primers to bind when the temperature is dropped. The primers competing with full length product copies for their binding sites on the template by being present in high concentrations.
While PCR can be successfully multiplexed up to a dozen or so amplicons, with careful design to avoid having the primers present in high concentrations interact with each other, eventually even the most careful design does not prevent primer-primer interactions. These create undesired amplicons, primer dimers, and other artifacts that defeat the utility of the PCR. U.S. Ser. Nos. 60/627,460, 60/62745, 11/271,366 and 11/647,609 contemplated libraries of such primers in the SNAP2 architecture. Here, self-avoidance was necessary to prevent “messes” from arising. The problem is also pressing if one wishes to do simple primer extension with libraries of primers.
U.S. Ser. Nos. 60/627,460, 60/62745, 11/271,366 and 11/647,609 disclosed two sets of nucleotides that could implement the SAMRS structures. These are shown in FIG. 4 and FIG. 5. These were in addition to a few structures that were already present in the literature that, although not used in primers or PCR, might be applied in a SAMRS priming system.
The first of these was disclosed by U.S. Pat. No. 5,912,340 a decade ago. U.S. Pat. No. 5,912,340 was not concerned with creating primers for DNA polymerases, or multiplexed PCR. Instead, U.S. Pat. No. 5,912,340 claimed:                “a pair of oligonucleotides (ODNs), each of said ODNs comprising nucleotide moieties having naturally occurring aglycon bases and a combination of modified aglycon bases selected from the group consisting of the combinations (1) A′, (2) G′, C′, and (3) A′, T′, G′, and C′, the duplex form of said pair of ODNs having a melting temperature under physiological conditions of less than approximately 40° C., each of said pair of ODNs being substantially complementary in the Watson-Crick sense to one of the two strands of a duplexed target sequence in nucleic acid, wherein the nucleotide moieties having the modified bases have the following properties:        With complementary oligonucleotides A′ does not form a stable hydrogen bonded base pair with T′ and forms a stable hydrogen bonded base pair with T;        With complementary oligonucleotides T′ does not form a stable hydrogen bonded base pair with A′ and forms a stable hydrogen bonded base pair with A;        With complementary oligonucleotides G′ does not form a stable hydrogen bonded base pair with C′ and forms a stable hydrogen bonded base pair with C;        With complementary oligonucleotides C′ does not form a stable hydrogen bonded base pair with G′ and forms a stable hydrogen bonded base pair with G.        
The inventors of U.S. Pat. No. 5,912,340 were satisfied if “sufficient” numbers of their primed nucleotides (analogous to the * analogs discussed here) were incorporated to prevent the two oligonucleotides in the pair from binding to each other, or (in later work) if sufficient numbers of the analogs were present to prevent the DNA or RNA molecule from folding on itself. U.S. Pat. No. 5,912,340 did not provide any melting temperatures, nor did subsequent work, nor did it provide assurance that one of ordinary skill in the art could get useful predictability (without undue experimentation) from oligonucleotides built from the components that they (and they and others in subsequent work) provided. As they provided no data with polymerases acting on these unnatural compounds as templates or primers, it was not certain that they would be accepted by polymerases, and it was definitively uncertain whether they would be accepted by polymerases with sufficient efficiency to support the demands of PCR.
Nor was it necessary for U.S. Pat. No. 5,912,340 or subsequent work to do so, as their principal goal was self-avoidance. They did not intend to provide (or, it seems, even contemplate providing) primers, let alone primers suitable for PCR.
Nor did they provide these. As the instant invention was developed as we faced the pressing demands mentioned above, we encountered significant difficulties, some described below, that forced the following conclusion: Even though, as their inventors, the instant applicants had the benefits of the teachings of U.S. Ser. Nos. 60/627,460, 60/62745, 11/271,366 and 11/647,609 (the predecessors of the instant application), the instant applicants would have been unable to get a functioning SAMRS for the purpose of priming and PCR based on the teachings of U.S. Pat. No. 5,912,340.
For example, U.S. Pat. No. 5,912,340 (Claim 2) suggests that the 3-position of purines can be a CH, not an N (see structure (i), where X can be either N or CH). While X may be able to be CH as taught in U.S. Pat. No. 5,912,340 for the utility taught in U.S. Pat. No. 5,912,340, the instant disclosure teaches, as a result of experimentation, that X cannot be CH for the purpose of creating primers to be used in PCR with SAMRS. Likewise, R* in structure (i) is taught by U.S. Pat. No. 5,912,340 to be possibly a cross-linking function or a reporter group; in contrast, the instant disclosure teaches that R* cannot have these structures. Likewise, R* in structure (ii) is taught by U.S. Pat. No. 5,912,340 to be possibly a cross-linking function or a reporter group; the instant disclosure teaches that R* in this structure must be H, and this teaching is again supported by experimentation. Likewise, R2 in various structures in Claim 2 and Claim 3 is taught by U.S. Pat. No. 5,912,340 to be possibly alky, alkoxy, alkylthio, or F; the instant disclosure teaches that none of these are possible for the purposes of the instant invention.
Likewise, U.S. Pat. No. 5,912,340 taught that the replacement for C might be zebularine (Claim 5, structure (ix), R4═H, R5═H), either of the two mono-methyl analogs of zebularine (Claim 5, structure (ix), R4═H, R5═CH3 or R4═CH3, R5═H), or dimethyl zebularine (Claim 5, structure (ix), R4═CH3, R5═CH3). We tried all of these. We could achieve useful primers with a small number of these incorporated as C*. We could not, however, do this with large numbers. The preferred structure proposed by U.S. Pat. No. 5,912,340 as a replacement for cytidine seemed to be wholly unacceptable as a polymerase substrate. Only 2-thioT as a thymidine replacement and 2-aminopurine as an adenine replacement appear to be useful for our purposes of the instant invention.
This is certainly suggested by subsequent work examining systems evidently inspired by U.S. Pat. No. 5,912,340. For example, seeking triphosphates that would be incorporated by polymerases to create oligonucleotides that would not self-fold, Lahoud et al. [Lah08] were forced to set up a screen to identify these, even though certain coauthors of [Lah08] are the same as certain inventors for U.S. Pat. No. 5,912,340. [Lah08] does not overlap Ser. Nos. 60/627,460, 60/62745, 11/271,366 and 11/647,609 (predecessors of the instant application) or the instant application, because the instant application places the SAMRS components in the primers and templates and uses standard nucleoside triphosphates, while [Lah08] uses standard nucleotides in the primers and templates and incorporates certain SAMRS triphosphates. But clearly the prior art does not anticipate the invention of [Lah08], given that the same inventors were still screening a decade after U.S. Pat. No. 5,912,340 was filed.
The instant applicants do not intend to claim that U.S. Pat. No. 5,912,340 is not enabling for the utilities that it disclosed, which is primarily to get self-avoidance. The instant applicants make no teaching on this question. Nor are the teachings of U.S. Pat. No. 5,912,340 and the instant application necessarily contradictory, considering their very different utilities. The goal of U.S. Pat. No. 5,912,340 was to provide just two oligonucleotides that would not bind to each other, without making any reference to their ability to serve as primers, either directly or as part of PCR. One of our goals is to define libraries of oligonucleotides, defined as mixtures of 10 or more. A goal of subsequent systems based on the teachings of U.S. Pat. No. 5,912,340, under the name of “pseudocomplementarity”, was to provide triphosphates that could be incorporated into oligonucleotides as triphosphates to give oligonucleotide products that would not self-fold. Another goal of subsequent systems based on the teachings of U.S. Pat. No. 5,912,340 was to incorporate the nucleobases taught into pairs of PNA molecules to allow them to invade duplex DNA without pairing to each other.
In contrast, the goal of the instant invention is to provide primers that could be extended by DNA polymerases when templated on a natural DNA, and to provide primers that could support PCR (which requires that a primer, after being extended, must also be accepted as a template by a DNA polymerase). Thus, there is no reason for U.S. Pat. No. 5,912,350 or any of the subsequent academic literature that is based on it to enable the instant invention, as it is not clear that anyone, prior to Ser. Nos. 60/627,460, 60/62745, 11/271,366 and 11/647,609 (the predecessors of the instant application), and the instant application considered using such molecules as primers, or as PCR primers, or as components of libraries.
Further issues relate directly to the use of SAMRS components in PCR. For example, the preferred compound for a G analog was inosine (U.S. Pat. No. 5,912,340, claims 11, 12, and 13). However, inosine is a deamination product of adenosine, and many thermostable polymerases of the type used in PCR were known to pause at inosine, presumably to permit the repair of this common defect.
The instant invention provides data concerning a range of possible SAMRS components, melting temperatures for many of these, and rules to permit their use in primers. This provides a critical mass to assemble first generation heuristic rules to predict the performance of the system.