[Not Applicable]
This invention relates to novel molecular constructs that act as various logic elements, i.e., gates and flip-flops. The constructs are useful in a wide variety of contexts including, but not limited to, computation and control systems.
The history of computational devices reveals a progression from larger and slower to smaller and faster devices. Huge stepwise advances in this progression have accompanied significant changes in the underlying technology. Thus, for example, vast increases in computational speed accompanied the transition from mechanical, hand-operated devices such as the abacus and hand operated cash-register or calculator to electrically driven mechanical computers (e.g., the electric cash register/calculator). Similarly significant increases in speed and decreases in size accompanied the shift from mechanical based devices to tube-based electronic computers, again with the shift from tube-based electronic computers to transistor-based electronic computers, and yet again with the shift from discrete transistor circuits to integrated circuits to large scale integrated (LSI) circuits.
The continually decreasing size and increasing speed of large scale integrated electronic devices has recently provoked increased interest and concern regarding the theoretical and practical limits of this progression. Such theoretical limits are affected by the inherent noise in electronic systems, the need to dissipate heat across ever decreasing surface areas as the feature size of various elements decreases, and the xe2x80x9canomalousxe2x80x9d behavior of devices as their physical size decreases to a point at which quantum mechanical rather than macroscopic properties predominate. (It will be noted however, that the emergence of quantum mechanical properties at small feature size may provide the basis for quantum computing devices and this field is receiving considerable interest). Practical limits are imposed by costs and difficulties in predictable and reliable microfabrication.
Another approach to the improvement in computational power and/or efficiency has involved the substitution of xe2x80x9clinearxe2x80x9d computing systems in which a single processor sequentially performs the necessary operations in a calculation with xe2x80x9cparallelxe2x80x9d computing system in which components of each calculation are distributed across two or more processing elements. Parallel computing systems can achieve vast savings in computational time. For example, an algorithm running on 100 computing elements in parallel in principle can run about 100 times faster than the same algorithm on a single element that must process each operation sequentially. Of course the actual gain in efficiency is less than 100 because some time is lost in parsing the algorithm between the various computing elements, in integrating the elements, and because some elements may have to wait for other elements to complete their calculation before the next operation can proceed. Nevertheless, massively parallel systems have been able to solve problems (e.g., identify large prime numbers) that could not be practically determined on linear computer systems.
A combination of the two approaches, massive parallelism combined with small computational element size has birthed the field of molecular computing. This is illustrated in the seminal paper by Adleman (1994) Science 266: 1021-1024, in which molecular biological tools were used to solve an instance of the directed Hamiltonian path problem. In particular, Adleman encoded the problem (a directed Graph) into nucleic acid sequences and then performed a series of ligations that ultimately produced an encoded solution which could then be decoded. Following Schneider (1991) J. Theoret. Biol., 148: 125, Adleman suggested that such molecular systems could demonstrate remarkable energy efficiency with a theoretical maximum of 34xc3x971019 operations per Joule while conventional supercomputers execute at most 109 operations per joule.
Adleman recognized that DNA molecular computing imposed certain difficulty and limitations, particularly on the encoding of various problems and recognized that conventional electronic computers have an advantage in the variety of operations they provide and the flexibility with which these operations can be applied. He did, however, note that for certain intrinsically complex problems, such as the directed Hamiltonian path problem where existing electronic computers are very inefficient and where massively parallel searches can be organized to take advantages of the operations provided by molecular biology, such molecular computations may be advantageous.
As indicated by Adleman, one limitation of prior molecular computation systems has been the lack of a variety of operations and the flexibility with which they may be applied.
This invention overcomes a number of these limitations by providing molecular logic devices that operate in a manner analogous to their electronic counterparts and thus provide a wide variety of operations. Thus, in one embodiment, this invention provides molecular bistable elements (flip-flops) and a wide variety of logic elements (gates) such as the AND, OR, NAND, NOR, NOT gates and others.
The central operational element of these devices is a nucleic acid having two or more protein binding sites (e.g., a first protein binding site and a second protein binding site). The sites are arranged such that when the first protein binding site is specifically bound by a protein, the second binding site cannot be bound by a protein that otherwise specifically recognizes and binds the second binding site; and when the second binding site is specifically bound by a protein, the first binding site cannot be bound by a protein that otherwise specifically recognizes and binds the first binding site. The binding sites are thus mutually exclusive. The nucleic acid can be a single or double stranded nucleic acid, however double stranded nucleic acids (e.g., DNA) are preferred. The first and the second binding sites can have the same or different nucleotide sequences. In one preferred embodiment the first and second binding sites are the same and have the nucleotide sequence of SEQ ID NO: 1 described herein.
The binding sites can be chosen so that they are specifically recognized (bound) by any of the nucleic acid binding proteins described herein (e.g., Fis, modified EF-tu, Tus, and LexA).
As indicated above, the binding sites are spaced so that they are mutually exclusive (only one can be bound at a time). The first binding site is preferably within 20 nucleotides (base pairs) of the second site, more preferably within 15 base pairs, and most preferably within 11 or fewer base pairs of the second site. Preferred binding sites have a strength of at least 2.4 bits as determined by individual information theory. The difference in strength between the two sites is at least 0 bits as determined by individual information theory.
The xe2x80x9cflip-flopxe2x80x9d may additionally include one or more selector binding sites (e.g. a third protein binding site) where the selector binding site is in proximity to the first protein binding site or to the second protein binding site such that specific binding of the third binding site (e.g., with a protein) precludes specific protein binding of the first or second protein binding sites.
In one preferred embodiment the flip-flop comprises the above-described nucleic acid in which the first protein binding site is a Fis binding site; the second protein binding site is a Fis binding site; and the binding sites are separated from each other by less a than 12 nucleotide base pairs. In a particularly preferred flip-flop the nucleic acid is a deoxyribonucleic acid comprising the sequence of SEQ ID NO: 2 or SEQ ID NO: 3 described herein.
In another embodiment this invention provides the various logic gates (NOR, OR, NOT, AND, NAND) described herein. The fundamental unit of these gates is the NOR gate. In one embodiment, the NOR gate is a composition comprising an isolated nucleic acid having a length of at least 5 base pairs and having a nucleotide sequence that encodes a first protein binding site, a second protein binding site, and a third protein binding site where the protein binding sites are spaced in proximity to each other such that when either the first protein binding site or the third protein binding is specifically bound by a nucleic acid binding protein, the second binding site cannot be bound by a nucleic acid binding protein that otherwise specifically recognizes and binds the second binding site; and where the first protein binding site and the third protein binding site can simultaneously be specifically bound by a nucleic acid binding protein. The NOR gate can be in a state in which the first or third binding site is bound by a nucleic acid binding protein (e.g. Fis or any of the binding proteins described herein) and thus set in a HIGH state. Similarly, the second binding site can be bound by a nucleic acid binding protein, but not when either the first or the second site is bound.
The binding protein bound to the second binding site can be attached to an activator (e.g. a gene transactivator such as Gal4). In addition, the NOR gate can further comprise a gene or cDNA under the control of the activator. The gene or cDNA can encode virtually any structural protein. Thus, in one embodiment, the gene may be a reporter gene (e.g., FFlux, GFP, etc.) or in another embodiment the gene may encode a nucleic acid binding protein. This provides a method of coupling the output of one gate or flip-flop to the input of the same gate or flip-flop or to the input of another gate or flip-flop.
As with the flip-flop described above, in one embodiment, the underlying nucleic acid can be double stranded (e.g., a DNA). The three binding sites comprising the NOR gate can all be different (in which case, no selectors are necessary although they optionally can be present). Alternatively, the first and third binding sites can have the same nucleotide sequence (i.e., bind the same protein with the same strength) in which case, the NOR gate acts like a NOT gate (when I1=I2, NOR(I1,I2)=NOT(I1)). In another embodiment, the first or third binding sites and the second binding site can have the same nucleotide sequence. The binding sites can be chosen so that they are cognate binding sites for any particular binding protein. Preferred spacings between the first and second site and between the second and third site are as described above.
Preferred binding sites have a binding strength of at least 2.4 bits as determined by individual information theory. In one embodiment, the difference in strength between the first, and third site is at least 0 bits as determined by individual information theory. In a particularly preferred embodiment, the first protein binding site is a Fis binding site; and the third protein binding site is a Fis binding site (e.g., the binding site of SEQ ID NO: 1)
In another embodiment this invention provides a composition for the storage of binary information. The preferred storage composition comprises any of the flip-flops described above having a nucleic acid binding protein bound to the first protein binding site or to the second protein binding site. The underlying nucleic acid can have restriction sites at one or both ends and preferably different restriction sites at each end. The restriction sites are preferably located so that when the binding site adjacent to a restriction site is occupied with a binding protein, a ligase is incapable of ligating the mating strand to that restriction site.
The storage composition can be free in solution or it can be attached to a solid support The binding protein may be covalently linked to the underlying nucleic acid. The binding protein can be attached to a gene transactivator as described above. In addition, the storage composition can include one or more genes or cDNAs as described above that are preferably under control of the activator.
In still another embodiment, this invention provides a method of storing information. The method involves binding a nucleic acid binding protein to a first protein binding site on a nucleic acid comprising any of the above-described flip-flops or on a nucleic acid comprising any of the gates described herein. The method may further involve the step of determining which binding site on the nucleic acid is bound by said binding protein.
This invention also provides a method of transforming binary information. This method involves binding a nucleic acid binding protein to an input protein binding site on any one or more of the gates described herein and determining whether or not a nucleic acid binding protein can bind to an output protein binding site. The output binding site can be on the same or a different gate and it can be on the same or a different nucleic acid. In a preferred embodiment, a nucleic acid comprising a gate used for this purpose has a length of at least 3, preferably a length of at least 5, more preferably a length of at least 7 and most preferably a length of at least 22 base pairs.
The terms xe2x80x9cpolypeptidexe2x80x9d, xe2x80x9cpeptidexe2x80x9d and xe2x80x9cproteinxe2x80x9d are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical analogue of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers.
The term xe2x80x9cnucleic acidxe2x80x9d refers to a deoxyribonucleotide or ribonucleotide polymer in either single- or double-stranded form, and unless otherwise limited, encompasses known analogs of natural nucleotides that can function in a similar manner as naturally occurring nucleotides. The term also includes nucleotides linked by peptide linkages as in xe2x80x9cpeptidexe2x80x9d nucleic acids.
The term xe2x80x9cspecifically bindsxe2x80x9d, as used herein, when referring to the binding of a protein or polypeptide to a nucleic acid refers to a protein nucleic acid interaction in which the protein binds strongly to a specific nucleic acid sequence pattern (nucleotide sequence) and less strongly to other different nucleic acid patterns (e.g., in a gel shift assay, specific binding will show an significant gel shift as compared to the gel shift shown by the same protein to other different nucleic acid sequences of the same length).
The term xe2x80x9cnucleic acid binding proteinxe2x80x9d is used herein to refer to a protein that specifically binds to a nucleic acid at a particular nucleotide sequence. Nucleic acid binding proteins include DNA binding proteins, mRNA binding proteins, tRNA binding proteins, and proteins that specifically bind modified or otherwise non-standard nucleic acids as described above. Nucleic acid binding proteins include, but are not limited to DNA binding proteins such as Fis, LacI, lambda cI, lambda cro, LexA, TrpR, ArgR, AraC, CRP, FNR, OxyR, IHF, GalR, MAlT, LRP, SoxR, SoxS, sigma factors, chi, T4 MotA, P1 RepA, p53, NF-kappa-B, and RNA binding proteins or proteins/RNA complexes such as ribosomes, T4 regA, spliceosomes (donor and acceptor), polyA binding factor, and the like. A large number of nucleic acid binding proteins are described in the TransFac database, see also Nucleic Acids Res. (25)(1)265-268 (1997).
A xe2x80x9cprotein binding sitexe2x80x9d refers to a nucleotide sequence in a nucleic acid to which a particular nucleic acid binding protein specifically binds.
The terms xe2x80x9ccognate proteinxe2x80x9d or xe2x80x9ccognate binding sitexe2x80x9d refer to the protein that specifically binds to the binding site or to the binding site that is specifically bound by a particular binding protein, respectively.
A binding site xe2x80x9cblockerxe2x80x9d, xe2x80x9cselectorxe2x80x9d, or xe2x80x9cmodulatorxe2x80x9d refers to a moiety that when bound adjacent to, in proximity to or on a binding site, partially or completely blocks binding of that site by its cognate nucleic acid binding protein.
The term xe2x80x9cflip-flopxe2x80x9d refers to a bistable device that exists-in one or the other of two mutually exclusive states. Thus the molecular flip-flops of this invention have two binding sites. only one of which can be bound at a time.
The term xe2x80x9cgatexe2x80x9d is used to refer to a device that produces a particular (predetermined) output in response to one or more inputs. Thus, for example, an AND gate produces a HIGH output only when all inputs are HIGH. An OR gate produces a HIGH output when any input is HIGH and a LOW output only when all inputs are LOW. A NOT function returns a HIGH when input is LOW and a LOW when input is HIGH. Gates and their uses are well known to those of skill in the art (see, e.g. Horowitz and Hill (1990) The Art of Electronics, Cambridge University Press, Cambridge).
The term xe2x80x9cstatexe2x80x9d is used to refer to the signal state of a particular binding site of a flip-flop or of a logic gate of this invention. A protein binding site that is protein bound or capable of being protein bound is said to be HIGH, while a binding site that is unbound and cannot be bound by a binding protein is said to be LOW.
The term xe2x80x9cinputxe2x80x9d is used herein to refer to a binding site to which a signal may be applied in order to elicit an output. The signal itself (e.g., a signal polypeptide) may also be referred to as an input. The difference will be determined from the context of usage. The term xe2x80x9cinput binding sitexe2x80x9d refers to a protein binding site that is used as an input.
The term xe2x80x9coutputxe2x80x9d is used herein to refer to a binding site that is rendered capable or incapable of binding its cognate protein as a consequence of an input binding event or events. The term output can also refer to the state of the output binding site. The output can provide an input for another gate or flip-flop of this invention.
A xe2x80x9csignal proteinxe2x80x9d is a nucleic acid binding protein that sets the (logical) state of a molecular flip-flop or of a molecular gate of this invention. As described herein, binding of a signal protein to a protein binding site on a nucleic acid sets the state of that binding site high. A signal protein can also be used to read the state of the flip-flop or gate. In this latter context, where the protein is capable of binding an output binding site (i.e., the binding site is unblocked), the state of the output is said to be HIGH. Conversely, where the output binding site is blocked, the state is said to be LOW.
The term xe2x80x9csetting the statexe2x80x9d when referring to a binding site refers to selectively binding or unbinding a signal protein from a particular binding site. Where the signal protein is bound to the binding site, the state of that binding site is set high. Conversely, where the signal protein is removed from the site, the state is set LOW. With respect to a flip-flop, setting the state refers to setting the flip-flop into one of its mutually exclusive stable states. Thus state one can be set by binding a signal protein to the first binding site, while state two can be set binding a signal protein to the second binding sites. Since the two states are mutually exclusive setting the state at one site implicitly involves setting (or switching) the state at the other site.
The term xe2x80x9cresetting the statexe2x80x9d the state refers to changing the state (e.g., from HIGH to LOW or from LOW to HIGH) of an input binding site, an output binding site, or a flip-flop.
The term xe2x80x9cGTPase-likexe2x80x9d protein refers to a binding protein that can release a bound nucleic acid with the dissipation of energy (e.g., hydrolysis of an energy source such as GTP, ATP, etc., or input of light). Such release may optionally be accomplished with the additional use of a co-factor. A GTPase-like protein includes naturally occurring GTPase-like proteins (including GTPases) as well as modified and non-natural GTPase-like proteins.
A xe2x80x9crecombinant expression cassettexe2x80x9d or simply an xe2x80x9cexpression cassettexe2x80x9d is a nucleic acid construct, generated recombinantly or synthetically, with nucleic acid elements that are capable of affecting expression of a structural gene or genes in hosts compatible with such sequences. Expression cassettes include at least promoters and optionally, transcription termination signals. Typically, the recombinant expression cassette includes a nucleic acid to be transcribed (e.g., a nucleic acid encoding a desired polypeptide), and a promoter. Additional factors necessary or helpful in effecting expression may also be used as described herein. For example, an expression cassette can also include nucleotide sequences that encode a signal sequence that directs secretion of an expressed protein from the host cell.
A xe2x80x9clogic cassettexe2x80x9d refers to an expression cassette in which the expression of one or more genes is under the control of one or more molecular gates or flip-flops of this invention.
The phrase xe2x80x9cexpression is under the control ofxe2x80x9d when referring to a logic element (e.g., gate or flip-flop) indicates that changes in the state(s) (input and/or output) of the gate or flip-flop alters the expression level of the gene or gene under said control.
Similarly, a gene xe2x80x9coperably linkedxe2x80x9d or xe2x80x9cunder the controlxe2x80x9d of an activator refers to a gene whose expression is altered by the presence or absence of a particular activator.
A xe2x80x9ctethered activatorxe2x80x9d refers to a gene activator (e.g. Ga14) bound directly or through a linker to a nucleic acid binding protein (e.g. LexA). The attachment can be chemical conjugation or by recombinant expression of a fusion protein. In some instances a repressor can be used in place of the activator and the tern tethered activator is intended to encompass this possibility.
The term xe2x80x9cbinding strengthxe2x80x9d as used herein refers to binding strength as calculated using individual information theory (e.g., as described in Schneider (1997) J. Theoret. Biol., 189(4): 427-441) or as measured by binding energy (xe2x88x92xcex94G).
The terms xe2x80x9cisolatedxe2x80x9d xe2x80x9cpurifiedxe2x80x9d or xe2x80x9cbiologically purexe2x80x9d refer to material which is substantially or essentially free from components which normally accompany it as found in its native state. In the case of a nucleic acid, an isolated nucleic acid is typically free of the nucleic acid sequences by which it is flanked in nature. An isolated nucleic acid can be reintroduced into a cell and such xe2x80x9cheterologousxe2x80x9d nucleic acids are regarded herein as isolated. In addition, nucleic acids synthesized de novo or produced by cloning (e.g. recombinant DNA technology) are also regarded as xe2x80x9cisolatedxe2x80x9d.
The term xe2x80x9csequence logoxe2x80x9d refers to-a graphical method for displaying the patterns in a set of aligned sequences. The characters representing the sequence are stacked on top of each other for each position in the aligned sequences. The height of each letter is proportional to its frequency and the letters are sorted so the most common is on top. The height of the entire stack is then adjusted to signify the information content of the sequences at that position. From these xe2x80x9csequence logosxe2x80x9d one can determine not only the consensus sequence, but also the relative frequency of bases and information content (measured in bits) at every position in a site or sequence. The logo displays both significant residues and subtle sequence patterns. Sequence logos are described in detail in Schneider and Stephens (1990) Nucl. Acids Res., 18: 6097-6100 and Schneider (1996) Meth. Enzym., 274: 445-455.
The term xe2x80x9csequence walkerxe2x80x9d refers to a graphical method for displaying how binding proteins and other macromolecules interact with individual bases of nucleotide sequences. Characters representing the sequence are either oriented normally and placed above a line indicating favorable contact, or upside-down and placed below the line indicating unfavorable contact. The positive or negative height of each letter shows the contribution of that base to the average sequence conservation of the binding site, as represented by a sequence logo. These sequence xe2x80x9cwalkersxe2x80x9d can be stepped along raw sequence data to visually search for binding sites. Many walkers, for the same or different proteins, can be simultaneously placed next to a sequence to create a quantitative map of a complex genetic region. One can alter the sequence to quantitatively engineer binding sites. Database anomalies can be visualized by placing a walker at the recorded positions of a binding molecule and by comparing this to locations found by scanning the nearby sequences. The sequence can also be altered to predict whether a change is a polymorphism or a mutation for the recognizer being modeled. The calculation and use of xe2x80x9csequence walkersxe2x80x9d are described in Schneider (1997) Nucl. Acids Res., 25: 4408-4415, and in copending application U.S. Ser. No. 08/494,115, filed on Jun. 23, 1995. The mathematics for walkers is given in: Schneider (1997) J. Theor. Biol. 189(4): 427-441