Scientists have long sought innovative molecular recognition systems that have binding properties that are useful in different ways. The structures of these systems have been modeled to resemble the structures of DNA and RNA which, in their polymeric form, are called “oligonucleotides”. Further, as with DNA and RNA, the molecular recognition systems have been useful because they bind to other components of the molecular recognition systems and/or to natural DNA and RNA following rules that can be expressed in a form that guides practitioners of ordinary skill in the art and enables them to do useful things.
DNA serves as an archetype to illustrate both molecular structure and rule base recognition. With DNA, three rules (A pairs with T, G pairs with C, the strands are antiparallel) permit the design of two DNA molecules that bind to each other in aqueous solution. When the rules are perfectly followed, two perfectly complementary DNA strands of a substantial length (15-20 nucleotides is normally sufficient in physiological buffers at 37° C.) will bind to each other with substantial selectivity even in complex mixtures containing many other DNA molecules. Heuristic rules have been developed over the years to permit the prediction of general trends in DNA:DNA binding affinity. These have come by performing substantial numbers of melting temperature experiments. For examples as heuristic rules, longer DNA strands generally bind to their partners with higher melting temperatures (Tms) than shorter strands. G:C pairs generally contribute more to duplex stability than A:T pairs. More highly parameterized models improve on the estimates of melting temperatures [All98a] [All98b] [Mar85] [Mat98]. While it remains true that the precise stability of duplexes may not be predictable, that imprecision does not defeat the utility of DNA:DNA binding or require undue experimentation to exploit, even though the number of different DNA sequences of length n (=4n) that would fall within a patent for the DNA molecular recognition system would be enormous.
It has been argued that this rule-based behavior arises because of the repeating charge in the backbone of nucleic acids [Ben04]. Certainly, analogs that have that repeating charges in their backbone maintain their rule-based pairing behavior even if they become quite long. In contrast, the few examples of useful nucleic acid analogs that lack a repeating charge in their backbone do not maintain their rule-based binding behavior in polymers built from two-dozen or more monomer units (fewer if the nucleobases are predominately guanine). The archetypal example of such an uncharged DNA analog is the peptide nucleic acids (PNAs) [Egh92], where rule-based molecular recognition does not survive in longer molecules.
Artificially Expanded Genetic Information Systems (AEGIS)
An archetype of a human-invented rule-based molecular recognition is the artificially expanded genetic information system (AEGIS) disclosed in U.S. Pat. No. 5,432,272. The design of this artificial molecular recognition system began with the observation that two principles of complementarity govern the Watson-Crick pairing of nucleic acids: size complementarity (large purines pair with small pyrimidines) and hydrogen bonding complementarity (hydrogen bond donors from one nucleobase pair with hydrogen bond acceptors from the other). These two principles give rise to the simple rules for base pairing (“A pairs with T, G pairs with C”) that underlie genetics, molecular biology, and biotechnology.
U.S. Pat. No. 5,432,272 pointed out that these principles can be met by nucleotides other than adenine (A) and thymine (T), and guanine (G) and cytosine (C). Rather, twelve nucleobases forming six base pairs joined by mutually exclusive hydrogen bonding patterns might be possible within the geometry of the Watson-Crick base pair. FIG. 1 shows some of the standard and non-standard nucleobase pairs, together with the nomenclature to designate them. Those nucleobase analogs presenting non-standard hydrogen bonding patterns are part of an Artificially Expanded Genetic Information System, or AEGIS.
U.S. Pat. No. 5,432,272 and subsequent patents all taught that the hydrogen bonding pattern that makes an AEGIS component useful as a unit of molecular recognition is distinguishable from the heterocycle that implements it. This means that different heterocycles can often serve interchangeably as molecular recognition elements. This, in turn, permits the elements of an artificial molecular recognition system to be chosen based on considerations other than simple recognition. Thus, the pyADA hydrogen bonding pattern in AEGIS is implemented by thymidine, uridine, uridine derivatives carrying a 5-position linker attached to a fluorescent moiety, uridine derivatives carrying a 5-position linker attached to a biotin, and pseudouridine, for example.
Four features of the AEGIS system make it suited for application:    (a) AEGIS supports rule-based design. Anyone of ordinary skill in the art can design two AEGIS-containing molecules that bind to each other, after learning only a few additional rules, just as they can design binding partners with standard DNA. Again, a critical mass of melting temperatures were collected to support heuristic rules that allow prediction of affinity. As with DNA, the precise Tms are not predictable even with these heuristic rules, but this imprecision does not defeat the utility of the system, or create a need for undue experimentation to design AEGIS pairing partners.    (b) This rule-based molecular recognition displayed by AEGIS is orthogonal to that displayed by standard DNA. If two strands incorporating standard DNA bases are mixed with two other strands incorporating AEGIS components, the first pair will bind to each other only, and the second pair will bind to each other only, without formation of hybrids between the strands containing canonical and non-canonical bases. This allows two molecular recognition processes to occur independently in the same vessel.    (c) Sequences built from AEGIS components have higher information density (more different sequences per unit length), especially when they incorporate the full 12 letters that the AEGIS technology allows. This allows fewer near-mismatches in complicated systems to slow hybridization, for example. Thus, AEGIS tags hybridize more quickly [Col97].    (d) Enzymes can be found that allow AEGIS systems to be manipulated in ways common in biotechnology with standard DNA. These enzymes include polymerases that do primer extension, copy templates that contain AEGIS components, and amplify AEGIS oligonucleotides a polymerase chain reaction (PCR). Here, undue experimentation is often required to obtain enzymes that do this effectively, as many natural enzymes regard non-standard nucleotides as “foreign”, and do not accept them or, if they do, do not accept them with useful affinity.
An archetypal application of AEGIS is in the branched DNA (bDNA) assay used to measure levels of HIV, hepatitis B, and hepatitis C viruses in human patients [Elb04a][Elb04b)]. As this example shows, even though the behavior of DNA duplexes built from AEGIS components having different sequences are not identical and may not be precisely predictable, this has not prevented the AEGIS molecular recognition system from improving the health care of some 400,000 patients annually [Ben04]. This is an illustration of the utility of orthogonality in the analytical chemistry of nucleic acids.
Self Avoiding Molecular Recognition Systems (SAMRS)
A self-avoiding molecular recognition system (SAMRS) has components that bind to natural DNA or RNA, but not to other components of the same unnatural system. In its general description, a SAMRS incorporates nucleobase analogs that replace T, A, G, and C by analogs that are indicated as T*, A*, G*, and C*, which are collectively called “* analogs” of T, A, G, and C respectively. In the simplest implementation of this concept, these * analogs are each able to form two hydrogen bonds to the complementary A, T, C, and G. This means that the T*:A, A*:T, C:*G, and G*:C nucleobase pairs contribute to duplex stability to approximately the same extent as an A:T pair. A SAMRS obtains its self-avoiding properties because the hydrogen bonding groups of the * analogs are chosen the T*:A* and C*:G* nucleobase pairs do not contribute as much to duplex stability because (in the simplest implementation) they are joined by only one hydrogen bond.
As with standard DNA, standard RNA, and oligonucleotides that add non-standard nucleobase pairing, within predicting the binding properties of any sequence within a SAMRS system will be subject to the same imprecision as predicting the properties of an arbitrary DNA or RNA molecule. Thus, as a general rule, if individuals of ordinary skill in the art wish to design a SAMRS sequence that binds to a preselected standard DNA molecule with a Tm of 25° C., they would write down the preselected sequence in the 5′-to-3′ direction, and then write below the SAMRS sequence in an antiparallel direction, matching a T* against every A in the preselected sequence, an A* against every T in the preselected sequence, a C* against every G in the preselected sequence, and a G* against every C in the preselected sequence. It is an open question as to whether such simple instructions allow one of ordinary skill in the art to obtain useful outcomes without undue experimentation. As elaborated below, attempts to obtain such utility failed when we took instruction from the prior art. One object of the instant invention is to provide SAMRS components that provide utility based on precisely this simple a set of rules and instructions.
The need for self-avoiding behaviors has long been pressing when an experimentalist sought to have mixtures containing more than two oligonucleotides, and especially pressing when making libraries of oligonucleotides (defined as having 10 or more oligonucleotide components), especially when those oligonucleotides were to interact with enzymes such as DNA polymerases. This problem is exemplified by multiplexed PCR, where the amplification is sought of many segments of DNA in one pot. This is attempted by adding in large excess two primers flanking each segment, contacting mixture with nucleoside triphosphates, and cycling the mixture up and down in temperature in the presence of a thermostable DNA polymerase. At low temperatures, the primers anneal to the template. At higher temperatures, the polymerase extends the primer to make a product copy of the template. At the highest temperature, the product copy falls off the template, allowing more primers to bind when the temperature is dropped. The primers compete with full length product copies for their binding sites on the template by being present in high concentrations.
While PCR can be successfully multiplexed up to a dozen or so amplicons, with careful design to avoid having the primers present in high concentrations interact with each other, eventually even the most careful design does not prevent primer-primer interactions. These create undesired amplicons, primer dimers, and other artifacts that defeat the utility of the PCR.