The present invention relates in general to the field of cloning vector plasmids, and in particular to methods for rapidly assembling DNA constructs or transgenes with cloning vector plasmids.
The foundation of molecular biology is recombinant DNA technology, which can here be summarized as the modification and propagation of nucleic acids for the purpose of studying the structure and function of the nucleic acids and their protein products.
Individual genes, gene regulatory regions, subsets of genes, and indeed entire chromosomes in which they are contained, are all comprised of double-stranded anti-parallel sequences of the nucleotides adenine, thymine, guanine and cytosine, identified conventionally by the initials A, T, G, and C, respectively. These DNA sequences, as well as cDNA sequences, which are double stranded DNA copies derived from mRNA (messenger RNA) molecules, can be cleaved into distinct fragments, isolated, and inserted into a vector such as a bacterial plasmid to study the gene products. A plasmid is an extra-chromosomal piece of DNA that was originally derived from bacteria, and can be manipulated and reintroduced into a host bacterium for the purpose of study or production of a gene product. The DNA of a plasmid is similar to all chromosomal DNA, in that it is composed of the same A, T, G, and C nucleotides encoding genes and gene regulatory regions, however, it is a relatively small molecule comprised of less than approximately 30,000 base-pairs, or 30 kilobases (kb). In addition, the nucleotide base pairs of a double-stranded plasmid form a continuous circular molecule, also distinguishing plasmid DNA from that of chromosomal DNA.
Plasmids enhance the rapid exchange of genetic material between bacterial organisms and allow rapid adaptation to changes in environment, such as temperature, food supply, or other challenges. Any plasmid acquired must express a gene or genes that contribute to the survival of the host or else it will be destroyed or discarded by the organism, since the maintenance of unnecessary plasmids would be a wasteful use of resources. A clonal population of cells contains identical genetic material, including any plasmids it might harbor. Use of a cloning vector plasmid with a NA insert in such a clonal population of host cells will amplify the amount of the DNA of interest available. The DNA so cloned may then be isolated and recovered for subsequent manipulation in the steps required for building a DNA construct. Thus, it can be appreciated that cloning vector plasmids are useful tools in the study of gene function, providing the ability to rapidly produce large amounts of the DNA insert of interest.
While some elements found in plasmids are naturally occurring, others have been engineered to enhance the usefulness of plasmids as DNA vectors. These include antibiotic- or chemical-resistance genes and a multiple cloning site (MCS), among others. Each of these elements has a role in the present invention, as well as in the prior art. Description of the role each element plays will highlight the limitations of the prior art and demonstrate the utility of the present invention.
A particularly useful plasmid-born gene that can be acquired by a host is one that would confer antibiotic resistance. In the daily practice of recombinant DNA technology, antibiotic resistance genes are exploited as positive or negative selection elements to preferentially enhance the culture and amplification of the desired plasmid over that of other plasmids.
In order to be maintained by a host bacterium, a plasmid must also contain a segment of sequences that direct the host to duplicate the plasmid. Sequences known as the origin of replication (ORI) element direct the host to use its cellular enzymes to make copies of the plasmid. When such a bacterium divides, the daughter cells will each retain a copy or copies of any such plasmid. Certain strains of E. coli bacteria have been derived to maximize this duplication, producing upwards of 300 copies per bacterium. In this manner, the cultivation of a desired plasmid can be enhanced.
Another essential element in any cloning vector is a location for insertion of the genetic materials of interest. This is a synthetic element that has been engineered into “wild type” plasmids, thus conferring utility as a cloning vector. Any typical commercially-available cloning vector plasmid contains at least one such region, known as a multiple cloning site (MCS). A MCS typically comprises nucleotide sequences that are cleaved by a single endonuclease enzyme, or a series of endonuclease enzymes (, each of which has a distinct recognition sequence and cleavage pattern. The so-called recognition sequences of a restriction endonuclease (RE) site encoded in the DNA molecule comprise double-stranded palindromic sequences. For some RE enzymes, as few as 4-6 nucleotides are sufficient to provide a recognition site, while some RE enzymes require a sequence of 8 or more nucleotides. The RE enzyme EcoR1, for example, recognizes the double-stranded hexanucleotide sequence: 5′ G-A-A-T-T-C 3′, wherein 5′ indicates the end of the molecule known by convention as the “upstream” end, and 3′ likewise indicates the “downstream” end. The complementary strand of the recognition sequence would be its anti-parallel strand, 3′ G-A-A-T-T-C-5′. Since every endonuclease site is a double-stranded sequence of nucleotides, a recognition site of 6 nucleotides is, in fact, 6 base pairs (bp). Thus the double stranded recognition site can be represented within the larger double-stranded molecule in which it occurs as:
5′ . . . . . . G-A-A-T-T-C . . . . . . 3′ 3′ . . . . . . C-T-T-A-A-G . . . . . . 5′.
Like many other RE enzymes, EcoR1 does not cleave exactly at the axis of dyad symmetry, but at positions four nucleotides apart in the two DNA strands between the nucleotides indicated by a “/”:
5′ . . . . . . G/A-A-T-T-C . . . . . . 3′ 3′ . . . . . . C-T-T-A-A/G . . . . . . 5′.
such that double-stranded DNA molecule is cleaved and has the resultant configuration of nucleotides at the newly formed “ends”:
5′ . . . . . . G 3′ 5′ A-A-T-T-C . . . . . . . 3′ 3′ . . . . . . C-T-T-A-A 5′ 3′ G . . . . . . . 5′
This staggered cleavage yields fragments of DNA with protruding 5′ termini. Because A-T and G-C pairs are spontaneously formed when in proximity with each other, protruding ends such as these are called cohesive or sticky ends. Any one of these termini can form hydrogen bonds with any other complementary termini cleaved with the same restriction enzyme. Since any DNA that contains a specific recognition sequence will be cut in the same manner as any other DNA containing the same sequence, those cleaved ends will be complementary. Therefore, the ends of any DNA molecules cut with the same RE enzyme “match” each other in the way adjacent pieces of a jigsaw puzzle “match”, and can be enzymatically linked together. It is this property that permits the formation of recombinant DNA molecules, and allows the introduction of foreign DNA fragments into bacterial plasmids, or into any other DNA molecule.
A further general principle to consider when building recombinant DNA molecules is that all endonuclease sites occurring within a molecule will be cut with a particular RE enzyme, not just the site of interest. The larger a DNA molecule, the more likely it is that any endonuclease site will reoccur. Assuming that any endonuclease sites are distributed randomly along a DNA molecule, a tetranucleotide site will occur, on the average, once every 44 (i.e., 256) nucleotides or bp, whereas a hexanucleotide site will occur once every 46 (i.e., 4096) nucleotides or bp, and octanucleotide sites will occur once every 48 (i.e., 114,688) nucleotides or bp. Thus, it can be readily appreciated that shorter recognition sequences will occur frequently, while longer ones will occur rarely. When planning the construction of a transgene or other recombinant DNA molecule, this is a vital issue, since such a project frequently requires the assembly of several pieces of DNA of varying sizes. The larger these pieces are, the more likely that the sites one wishes to use occur in several pieces of the DNA components, making manipulation difficult at best.
Frequently-occurring endonuclease enzyme sites are herein referred to as common sites, and the endonucleases that cleave these sites are referred to as common endonuclease enzymes. Restriction enzymes with cognate restriction sites greater than 6 bp are referred to as rare restriction enzymes, and their cognate restriction sites as rare restriction sites. However, there are some endonuclease sites of 6 bp that occur more infrequently than would be statistically predicted, and these sites and the endonucleases that cleave them are also referred to as rare. Thus, the designations “rare” and common” do not refer to the relative abundance or availability of any particular restriction enzyme, but rather to the frequency of occurrence of the sequence of nucleotides that make up its cognate recognition site within any DNA molecule or isolated fragment of a DNA molecule, or any gene or its DNA sequence.
A second class of endonuclease enzymes has recently been isolated, called homing endonuclease (HE) enzymes. HE enzymes have large, non-palindromic asymmetric recognition sites (12-40 base pairs). HE recognition sites are extremely rare. For example, the HE known as I-SceI has an 18 bp recognition site, (5′ . . . TAGGGATAACAGGGTAAT . . . 3′[SEQ ID NO:3]), predicted to occur only once in every 7×1010 bp of random sequence. This rate of occurrence is equivalent to only one site in 20 mammalian-sized genomes. The rare nature of HE recognition sites greatly increases the likelihood that a genetic engineer can cut a final transgene product without disrupting the integrity of the transgene if HE recognition sites were included in appropriate locations in a cloning vector plasmid.
Since a DNA molecule from any source organism will be cut in identical fashion by en endonuclease enzyme, foreign pieces of DNA from any species can be cut with an endonuclease enzyme, inserted into a bacterial plasmid vector that was cleaved with the same endonuclease enzyme, and amplified in a suitable host cell. For example, if a human gene can cut in 2 places with the RE enzyme known as EcoR1, the desired fragment with EcoR1 ends can be isolated and mixed with a plasmid that was also cut with EcoR1 in what is commonly known as a ligation mixture. Under the appropriate conditions in the ligation mixture, some of the isolated human gene fragments will match up with the ends of the plasmid molecules. These newly joined ends can link together (ligated) to enzymatically recircularize the plasmid, now containing its new DNA insert. The ligation mixture is then introduced into E. coli or another suitable host, and the newly engineered plasmids will be amplified as the bacteria divide. In this manner, a relatively large number of copies of the human gene may be obtained and harvested from the bacteria. These gene copies can then be further manipulated for the purpose of research, analysis, or production of its gene product protein.
Recombinant DNA technology is frequently embodied in the generation of so-called “transgenes”. Transgenes frequently comprise a variety of genetic materials that are derived from one or more donor organisms and introduced into a host organism. Typically, a transgene is constructed using a cloning vector as the starting point or “backbone” of the project, and a series of complex cloning steps are planned to assemble the final product within that vector. Elements of a transgene, comprising nucleotide sequences, include, but are not limited to 1) regulatory promoter and/or enhancer elements, 2) a gene that will be expressed as a mRNA molecule, 3) DNA elements that provide mRNA message stabilization, 4) nucleotide sequences mimicking mammalian intronic gene regions, and 5) signals for mRNA processing such as the poly-A tail added to the end of naturally-occurring mRNAs. In some cases, an experimental design may require addition of localization signal to provide for transport of the gene product to a particular subcellular location.
Each of the elements of a transgene can be derived as a fragment of a larger DNA molecule that is cut from a donor genome, or, in some cases, synthesized in a laboratory. While the present invention employs endonucleases for the methods claimed herein, it is known that each of the smaller elements comprising, for example, the inserts or modules which are used in the methods herein, can be created by de novo synthesis, recombineering, and/or PCR terminator overhang cloning. One such method of synthesis of the component elements of a transgene includes the method disclosed by Jarrell et al. in U.S. Pat. No. 6,358,712, which is incorporated herein by reference in its entirety. While Jarrell discloses a method for “welding” elements of a transgene together, only the methods of the present invention disclose a way to “unweld” and re-assemble the elements once they have been assembled. According to one aspect of the invention, each piece is assembled with the others in a precise order and 5′-3′ orientation into a cloning vector plasmid.
The promoter of any gene may be isolated as a DNA fragment and placed within a synthetic molecule, such as a plasmid, to direct the expression of a desired gene, assuming that the necessary conditions for stimulation of the promoter of interest can be provided. For example, the promoter sequences of the insulin gene may be isolated, placed in a cloning vector plasmid along with a reporter gene, and used to study the conditions required for expression of the insulin gene in an appropriate cell type. Alternatively, the insulin gene promoter may be joined with the protein coding-sequence of any gene of interest in a cloning vector plasmid, and used to drive expression of the gene of interest in insulin-expressing cells, assuming that all necessary elements are present within the DNA transgene so constructed.
A reporter gene is a particularly useful component of some types of transgenes. A reporter gene comprises nucleotide sequences encoding a protein that will be expressed under the direction of a particular promoter of interest to which it is linked in a transgene, providing a measurable biochemical response of the promoter activity. A reporter gene is typically easy to detect or measure against the background of endogenous cellular proteins. Commonly used reporter genes include but are not limited to LacZ, green fluorescent protein, and luciferase, and other reporter genes, many of which are well known to those skilled in the art.
Introns, which are non-coding regions within mammalian genes, are not found in bacterial genomes, but are required for proper formation of mRNA molecules in mammalian cells. Therefore, any DNA construct for use in mammalian systems must have at least one intron. Introns may be isolated from any mammalian gene and inserted into a DNA construct, along with the appropriate splicing signals that allow mammalian cells to excise the intron and splice the remaining mRNA ends together.
An mRNA stabilization element is a sequence of DNA that is recognized by binding proteins that protect some mRNAs from degradation. Inclusion of an mRNA stabilization element will frequently enhance the level of gene expression from that mRNA in some mammalian cell types, and so can be useful in some DNA constructs or transgenes. An mRNA stabilization element can be isolated from naturally occurring DNA or RNA, or synthetically produced for inclusion in a DNA construct.
A localization signal is a sequence of DNA that encodes a protein signal for subcellular routing of a protein of interest. For example, a nuclear localization signal will direct a protein to the nucleus; a plasma membrane localization signal will direct it to the plasma membrane, etc. Thus, a localization signal may be incorporated into a DNA construct to promote the translocation of its protein product to the desired subcellular location.
A tag sequence may be encoded in a DNA construct so that the protein product will have a unique region attached. This unique region serves as a protein tag that can distinguish it from its endogenous counterpart. Alternatively, it can serve as an identifier that may be detected by a wide variety of techniques well known in the art, including, but not limited to, RT-PCR, immunohistochemistry, or in situ hybridization.
With a complex transgene, or with one that includes particularly large regions of DNA, there is an increased likelihood that there will be multiple endonuclease recognition sites in these pieces of DNA. Recall that the recognition sequences encoding any one hexanucleotide site occur every 4096 bp (46). If a promoter sequence is 3000 bp and a gene of interest of 1500 bp are to be assembled into a cloning vector of 3000 bp, it is statistically very likely that many sites of 6 or less nucleotides will not be useful, since any usable sites must occur in only two of the pieces. Furthermore, the sites must occur in the appropriate areas of the appropriate molecules that are to be assembled. In addition, most cloning projects will need to have additional DNA elements added, thereby increasing the complexity of the growing molecule and the likelihood of inopportune repetition of any particular restriction site. Since any restriction enzyme will cut at all of its sites in a molecule, if an endonuclease enzyme restriction site reoccurs, all the inopportune sites will be cut along with the desired sites, disrupting the integrity of the molecule. Thus, each cloning step must be carefully planned so as not to disrupt the growing molecule by cutting it with an endonuclease enzyme that has already been used to incorporate a preceding element. And finally, when a researcher wishes to introduce a completed transgene into a mammalian organism, the fully-assembled transgene construct frequently must be linearized at a unique recognition site at at least one end of the transgene, thus requiring yet another unique recognition site found nowhere else in the construct. Since most DNA constructs are designed for a single purpose, little thought is given to any future modifications that might need to be made, further increasing the difficulty for future experimental changes.
Traditionally, transgene design and construction consumes significant amounts of time and energy for several reasons, including the following:
1. There is a wide variety of endonuclease enzymes available that will generate an array of termini, however most of these are not compatible with each other. Many endonuclease enzymes: such as EcoR1, generate DNA fragments with protruding 5′ cohesive termini or “tails”; others (e.g., Pst1) generate fragments with 3′ protruding tails, whereas still others (e.g., Bal1) cleave at the axis of symmetry to produce blunt-ended fragments. Some of these will be compatible with the termini formed by cleavage with other endonuclease enzymes, but the majority of useful ones will not. The termini that can be generated with each DNA fragment isolation must be carefully considered in designing a DNA construct.
2. DNA fragments needed for assembly of a DNA construct or transgene must first be isolated from their source genomes, placed into plasmid cloning vectors, and amplified to obtain useful quantities. The step can be performed using any number of commercially-available or individually altered cloning vectors. Each of the different commercially available cloning vector plasmids were, for the most part, developed independently, and thus contain different sequences and endonuclease sites for the DNA fragments of genes or genetic elements of interest. Genes must therefore be individually tailored to adapt to each of these vectors as needed for any given set of experiments. The same DNA fragments frequently will need to be altered further for subsequent experiments or cloning into other combinations for new DNA constructs or transgenes. Since each DNA construct or transgene is custom made for a particular application with no thought or knowledge of how it will be used next, it frequently must be “retro-fitted” for subsequent applications.
3. In addition, the DNA sequence of any given gene or genetic element varies and can contain internal endonuclease sites that make it incompatible with currently available vectors, thereby complicating manipulation. This is especially true when assembling several DNA fragments into a single DNA construct or transgene.
Thus, there remains a need for a system that would allow the user to rapidly assemble a number of DNA fragments into one molecule, despite redundancy of endonuclease sites found at the ends and within the DNA fragments. Such a system might also provide a simple means for rapidly altering the ends of the fragments so that other endonuclease sequences are added to them. Inclusion of single or opposing pairs of HE sites would enhance the likelihood of having unique sites for cloning. A system that would also allow easy substitutions or removal of one or more of the fragments would add a level of versatility not currently available to users. Therefore, a “modular” system, i.e. a system allowing one to insert or remove DNA fragments or “inserts” into or out of “cassette” regions flanked by rare endonuclease sites within the cloning vector, would be especially useful and welcome to the field of recombinant DNA technology.