A DNA computer uses chemical reactions to perform computational tasks by taking advantage of the fact that information may be stored in DNA molecules and in the chains of nucleobases (generally referred to here as “bases”) that comprise DNA molecules. An element of data may thus be encoded into a molecular chain of bases, much like the way in which a sequence of binary digits stored in conventional computer memory might represent a numeric value or a string of text.
Computational operations may be performed on such encoded data because certain types of bases more readily pair (or chemically bind) with each other to form “base pairs” through a process known in the art as Watson-Crick bonding. It is therefore possible to organize bases into a pair of complementary sequences that combine into a double-helix DNA molecule when mixed together in an appropriate medium. The resulting DNA molecule may then represent a result of a logical or arithmetic operation performed upon data elements represented by the two sequences.
If, for example, a first base A1 binds to a second (complementary) base A2 and a third base B1 binds to its complementary, fourth base B2, the first four bases of a molecular sequence A1-B1-B1-B1-B2 will bind with the last four bases of a second molecule comprising the sequence A1-A1-A1-A2-B2-B2-B2 when the two are mixed together and allowed to react. The resulting DNA molecule, in which the two initial molecular sequences are bound together in a double-helix structure, may, through proper encoding, thus represent a relationship between the two sequences or a result of an operation performed upon the two sequences.
In this way, a pair of complementary sequences may be used to implement data-processing functions like an arithmetic or logical operation, a pattern-matching function, a database lookup, or a sorting operation. In a syntactical-analysis application, for example, two sequences might be configured to pair only if both those sequences represent data elements that satisfy a common set of criteria, such as identifying a part of speech. Similarly, in a data-analysis application, two sequences might be configured to bind together only if the two represent information stored in a same data-storage format, such as a numeric integer format.
In an example that continues to use the above sequence-naming convention, a first molecule, comprising a first sequence A1-A1-A1-A2-B2-B2-B2, could be used to identify data values stored as integer variables if the first sequence is configured to bind only with sequences that represent integer data values. A second sequence into which has been encoded an integer number might further comprise a complementary subsequence A1-B1-B1-B1. When mixed together in an appropriate medium, the last four nucleobases of the first molecule (A2-B2-B2-B2) would bind with the second molecule's A1-B1-B1-B1 subsequence. The resulting DNA molecule could then be decoded or interpreted to specify that information represented by one strand of the resulting double-helix may be identified as an integer value by information encoded into the second strand.
In this way, data structures and complex data-processing operations may be encoded into molecular “programs” and such a program may be “run” by allowing carefully sequenced chains of nucleobases selectively bind together in a chemical reaction.
One advantage of DNA computing is that it allows an enormous number of information-bearing molecules to be mixed together in just a drop of medium, where they may combine almost instantaneously, consuming a tiny fraction of the energy required by by electronic computers. When used to solve certain types of highly parallel, data-intensive operations, they can produce results far more quickly and efficiently than even the most powerful conventional computer systems.
Machine learning is a field of computer science in which a computer program continuously revises itself in response to patterns it detects in its input data. Rather than following an unchanging, predetermined, set of instructions, a machine-learning program automatically updates a data model as a function of its input, and bases inferences, decisions, computations, and output at least partially upon updated characteristics of that data model.
Machine-learning applications are useful in fields where a system must adapt to highly dynamic or difficult-to-quantize data, such as computational statistics, optical-character recognition, search, optimization, malware prevention, spam filtering, voice-recognition, speech-recognition, and Web analytics. In such applications, a machine-learning application trains itself by identifying and correlating characteristics or patterns of each new input it receives, using the resulting inferences to make its output more accurate or efficient.
When implemented on a conventional computing platform, the high degree of parallelism, recursion, or computational complexity required by many machine-learning applications can impose a great burden on a conventional electronic, scalar, computer system. However, implementing machine-learning algorithms and models on a massively parallel, energy-efficient DNA-computing platform would extend the benefits of machine-learning applications into areas where it would not otherwise be cost-effective or computationally practical. No such solution exists today.