This invention provides methods for DNA analog representation of vector operations, including vector addition, determination of inner and outer products of vectors, and of the product of a matrix and a vector, using negative as well as non-negative numbers. The methods of the present invention utilize the spectrum of biochemical activities and operations which DNA molecules are capable of undergoing, including base-specific Watson-Crick hybridization, ligation, polymerase extension, site-specific strand cleavage via restriction enzymes, melting of duplex DNA, cleavage of DNA by site-specific endonucleases, and degradation of DNA by exonucleases of broad sequence specificity.
Watson-Crick hybridization of complementary DNA oligomers makes possible a DNA analog representation of highly parallel operations [1, 2]. The present invention develops this potential and provides methods whereby DNA analog representation of the operations of vector algebra is used to produce a DNA-based neural network [3] which may be used in an associative or content addressable memory [4-6] and a DNA multilayer perceptron [7, 8].
All publications and patent applications referred to herein are incorporated by reference fully as though each individual publication or patent application was specifically and individually indicated to be incorporated by reference.
Various strategies for finding solutions to mathematical problems have been devised which use sets of DNA oligonucleotides having selected length and sequence properties. For example, there are methods that use DNA oligomers of defined nucleotide sequence to solve a Hamiltonian path problem [1], a xe2x80x9csatisfactionxe2x80x9d problem [2] and for performing addition [9] and matrix multiplication [10] of non-negative numbers. Baum [11] has proposed using DNA operations akin to those described by Adleman [1] and Lipton [2] to produce an associative DNA memory of enormous capacity. Prior to the development of the methods of the present invention, methods for using DNA oligomers in analog representation of matrix multiplication that include use of negative numbers as well as non-negative numbers were not disclosed or taught.
Adleman [1] first pointed out that Watson-Crick hybridization of complementary DNA strands makes possible a representation of highly parallel selective operations that could be a basis for molecular computation. In practice, small departures from the ideal selectivity of DNA hybridization can lead to undesired pairings of strands that create significant difficulties in implementing schemes using interactions of DNA oligomers to represent large scale Boolean functions. Recently, however, Deaton et al. [12] showed that it should be possible to find a large enough set of mutually non-hybridizing DNA strands to allow digital molecular computation of high complexity with tolerable error rates.
A neural network is a physical system that models a simple biological neuronal system, in that it comprises a large number of interconnected processing elements, called neurons. The activity of a given neuron is determined by the weighted sum of all of the signals that the neuron receives from the neurons to which it is connected. In most neural network models, the total activity of the ith neuron, called a xe2x80x9cperceptron,xe2x80x9d is
ai=wi0+xcexa3wijxj
where xj is the signal received from the jth neuron that is weighted by an amount Wij. wi0 is a bias weight, and is usually negative. The ith neuron responds to incoming signals by itself sending a signal y=F(ai). The function F(ai) is a saturating function; a common choice is the non-linear logistic or sigmoid function,
F(a)=(1+exp(xe2x88x92a))xe2x88x921
which restricts the output to be between 0 and 1, and gives an approximately linear response for small levels of activity. Thus, the activity of the ith perceptron is positive when the sum of the incoming weighted signals is larger than the negative bias weight; and when the incoming signal is sufficiently large, the output of the ith perceptron is approximately 1 (see, for example, W. Penny et al., page 386-387, in [8]). From the parallel operations and interactions of the neurons emerge collective properties that include production of a content-addressable memory which correctly yields an entire memory from any subpart of sufficient size [3]. Neural networks do not need the high precision associated with digital computing [3]. Because they are fault tolerant, such neural networks can be represented by DNA with the massive parallelism first envisioned by Adleman [1].
The present invention provides a method for DNA-based analog representation of the operations of vector addition and vector and matrix algebra, using negative as well as non-negative numbers, wherein a subset of all single-stranded DNA n-mers is in 1:1 correspondence with the basis vectors ei, i=1,2, . . . , m, in an abstract m-dimensional vector space; an m-component vector V in a space with basis vectors ei, i=1 through m, is represented by the equation V=xcexa3iViei, and its analog representation is a DNA sample containing strands Ei or their complement Ej, for each i=1 through m, where the presence of Ei or Ei is determined by the sign of the amplitude Vi of the ith component of the vector, and the concentration of each Ei or Ei is proportional to the magnitude of the amplitude Vi.
The present invention further provides a method for implementing an analog neural network, wherein the data of the processing units, or neurons, is in the form of m-component vectors V=xcexa3iViei, each of which is represented by a set of the oligomers as described above. The interconnections and the transmission of signals between the neuronal units are represented by biochemical processes and reactions involving the oligomers Ei and Ei; such processes and reactions include diffusion, molecular recognition, and specific hybridization of complementary oligomers, and nucleotide sequence-specific reactions of nucleic acid-modifying enzymes acting on the oligomers, as occur in analog operations of vector addition and vector and matrix algebra. Application of a saturating function to a signal from one or more neuronal units to produce an output is represented by hybridization of a set of oligomers selected by said set of biochemical reactions to a complete, sub-stoichiometric set of single-stranded Ei and Ei oligomers, and an output of the neural network is represented by a set of oligomers that specifically hybridize to said sub-stoichiometric set of Ei and Ei oligomers.
In a specific embodiment, an analog content addressable memory is produced by representing elements of memory as m-component vectors V=xcexa3iViei; wherein items of experience, a set of vectors Via, are stored in memory by forming the outer product over all the experience vectors for ixe2x89xa0j:
Tij=xcexa3aViaVja;
wherein recall of a particular experience Vib imperfectly represented as Uib is effected by the algorithm:
Vi=S(xcexa3TijVj+Uib);
where the function S(x) is a saturating function such as
gxc2x7tan h(x),
with g being the small-signal gain; and wherein the saturating function S(Xi) is implemented by letting DNA strands representing the vector Xi hybridize to a hybridization oligonucleotide array, and the collection of DNA strands representing the saturated Xs, S(Xi), is obtained by selectively denaturing the duplex molecules in the array containing the S(Xi) strands and collecting the desired set of DNA oligomers.