There is growing interest in the use of DNA and the methods of molecular biology to do computation. In an article entitled "Molecular Computation of Solutions to Combinatorial Problems" appearing in Science, vol. 266, pages 1021-1024 (1994), L. M. Adleman described an approach requiring the encoding of computer science problems into DNA sequences, and relying heavily on "extraction" of sequences containing a particular subsequence by use of complementary subsequence. Subsequently, R. J. Lipton in an article entitled "DNA Solution of Hard Computational Problems," in Science, vol. 268, pages 542-545 (1995) proposed an approach for using DNA to solve Satisfiability and other problems in the computational class NP. Lipton proposed a particular encoding of boolean vectors and relied on similar extraction operations using complementary sub-sequences. E. B. Baum, in an article entitled "Building an Associative Memory Vastly Larger than the Brain," in Science, vol. 268, pages 583-585 (1995), proposed to use a similar encoding, and some variants, for content addressable memories. D. Boneh et al in a paper entitled "On the Computational Power of DNA," a preprint of which is available at http://www.CS.Princeton.EDU/.about.dabo/biocomp.html and D. Boneh et al in a paper entitled "Breaking DES Using a Molecular Computer," a preprint of which is available at http://www.CS.Pririceton.EDU/.about.dabo/biocomp.html propose computer algorithms which rely on similar encodings and method. U.S. patent application Ser. No. 08/384,995 entitled "Associative Memory using DNA" by E. B. Baum describes DNA based content addressable memories, which application is hereby incorporated herein by reference. U.S. patent application Ser. No. 08/414,398 entitled "Molecular Automata Utilizing Single- or Double-Strand Oligonucleotides," by A. L. Schweitzer and W. D. Smith, now U.S. Pat. No. 5,804,373 describes the use of DNA as a Turing machine.
For each of these arrangements, a set of DNA subsequences must be chosen. Practical considerations will force these subsequences to satisfy certain requirements. Lipton and Adleman suggested using random subsequences. In fact, practical requirements may impose constraints that cannot be met by random sequences, and it is not a priori obvious that they can be satisfied at all.
When using the encoding described by Lipton in order to encode the Boolean vectors {0, 1}.sup.n, where n is about 60, for each i=1, . . . , n, two subsequences of DNA X.sub.i and Y.sub.i are chosen, corresponding respectively to a 0 or a 1 in the ith component. A vector in {0, 1}.sup.n is then encoded by the concatenation of the appropriate subsequences, perhaps spaced by a fixed subsequence, or a subsequence corresponding to the number of the component. This evidently requires at least 120 suitable subsequences. In some of the algorithms proposed in the Boneh et al articles, the initial vectors are extended by appending additional subsequences in a similar fashion (e.g. corresponding to a tag that the vector encoded satisfies some Boolean circuit) so that the number of subsequences needed will grow substantially, and may reach tens of thousands or more. If enough suitable subsequences are not available, this will constrain algorithmic possibilities.
Let Z be a sequence of DNA. Then let Z denote the sequence of DNA which is Watson-Crick complementary to Z. The Watson Crick complement of the sequence Z is the sequence obtained by replacing A and T and C and G and vice versa, and then taking the sequence in the reverse order. For example, if Z=AGTCC, then Z=GGACT.
Using an encoding as described above, a key operation in many of the algorithms proposed is an "extract". In an extract operation, a subsequence X.sub.i or Y.sub.i is produced to which a magnetic bead is affixed. Placing these magnetic beads into a test-tube computer, the introduced subsequences bind to any molecules already present containing the complementary subsequences. These bound molecules can now be extracted magnetically. This process allows one to search the test tube for vectors having particular component values. In practice, some molecules may incorrectly bind or fail to bind at the proper location