The present invention relates to systems, methods, and compositions for cloning and sequencing insert nucleic acid sequences. In particular, the present invention provides vectors and vector components configured for multiplex cloning, multiplex sequencing, and fixed orientation cloning. The present invention also provides vectors and vector components that allow insert sequences that are deleterious to a host cell to be successfully cloned.
Prior to the 1990""s, DNA sequencing was a time consuming, labor intensive, manual protocol by which individual researchers read 100""s of bases per day from a single DNA template. It has since evolved into an automated, robotic process by which major genome sequencing centers read tens of millions of bases from tens of thousands of DNA templates per day. This vast increase in sequencing capacity has broadened the scope of DNA sequencing to entire genomes rather than individual genes. It has likewise created a need to increase the rate of throughput in all stages of the sequencing process.
The most prominent example of large scale sequencing to date is the Human Genome Initiative, an effort to sequence all 3.3 billion bases of the human genome. Begun in 1990, the Human Genome Initiative was declared xe2x80x9cfinishedxe2x80x9d on Jun. 26, 2000, by the major genome centers involved. The public draft genome released by the National Institutes of Health consortia was 85% assembled, with 97% of the genome covered by clones whose location is known. This project required reading some 25 million DNA sequences. In a completely independent effort, Celera Corporation claimed to have 99% of the genome sequence assembled at a 3xc3x97 redundancy level, which required 27 million DNA sequencing reads.
The public effort for xe2x80x9ccomplete and accuratexe2x80x9d sequencing, typically defined as 5xc3x97 coverage and an accuracy of not more than 1 mistake every 10,000 bases, will require sequencing millions of additional plasmid clones over several more years to obtain high quality data on the entire genome. Because so much of the human genome is not characterized, a more complete understanding of it will be facilitated by sequencing the genomes of other organisms for comparison, such as the mouse, rat, dog, and chimpanzee. In fact, Celera claims to have sequenced three mouse genomes during the year 2000, while the NIH consortia of university and international genome centers have begun work on the mouse and rat genome. The NIH has also initiated funding of pilot sequencing projects for the chicken, puffer fish, and zebra fish.
At the 12th International Genome Sequencing and Analysis Conference in Miami, Fla. (Sep. 12-15, 2000), Celera presented data showing that over 200,000 plasmid template purifications a day are required to sustain their ongoing sequencing efforts. The NIH consortia purify a similar number of templates on a daily basis. Genome sequencing facilities at other large corporations, overseas national genome projects, and smaller academic labs sequence an additional 500,000 plasmid templates per day. Thus, the worldwide rate of sequencing is rapidly approaching 1,000,000 templates per day.
The generation of clone banks, or libraries, of DNA is an important intermediate step in sequence analysis of whole genomes. In a process called shotgun cloning and sequencing, large molecules of DNA, often more than 100,000 bases (100 kb) in length, are fragmented and reduced to libraries of numerous sub-clones of approximately 1-4 kb for propagation and sequence analysis. Most large-scale DNA sequencing strategies depend on a multi-step process to randomly fragment the target molecule into these smaller pieces, which are then enzymatically joined (ligated) into a cloning vector in a reaction that inserts one or more DNA fragments into a single site in each vector molecule (Fitzgerald et al., Nucleic Acids Res. 14:3753 [1992]). This ligation mixture is introduced into specific strains of Eschericia coli (E. coli), with each bacterial cell propagating one vector along with any DNA fragments it carries. The vector DNA, which may or may not contain an insert, is purified from each cell line and used as a template in an enzymatic sequencing reaction (Sanger et al., Proc Natl Acad Sci USA 74:5463 [1977]; Prober et al., Science 238:336 [1987]; Tabor and Richarson, Proc Natl Acad Sci U S A 92:6339 [1995], all of which are hereby incorporated by reference). The reaction product is analyzed by automated sequencing instruments to determine the linear sequence of the sub-cloned DNA fragments (Smith et al., Nature 321:674 [1986], hereby incorporated by reference). Computer algorithms are used to assemble the data from the library of sub-fragments, typically producing sequence information for 80-95% of the original DNA molecule. xe2x80x9cGap fillingxe2x80x9d techniques are used to determine the remaining 5-20% of the target DNA.
Although most DNA sequencing methods utilize one template or primer per sequencing reaction, there are exceptions to this pattern. In early examples, Church et al. (Science 240: 185 [1988]) and Creasey et al. (BioTechniques 11: 102 [1991]) performed multiple Sanger dideoxy sequencing reactions in a single set of four tubes, using vectors containing unique sequence tags. The reactions from each set of tubes were run on a sequencing gel and transferred to a nylon membrane. Each sequence reaction was then detected by sequentially probing the membrane with an oligonucleotide specific for the tag on each vector. Other variations on this theme have also been developed (Cherry et al., Genomics 20: 68 [1994]).
Subsequently, Wiemann et al. (Anal. Biochem. 224: 117 [1995]; Anal. Biochem. 234: 166 [1996]) showed that fluorescently labeled sequencing primers could be used to simultaneously sequence both strands of a dsDNA template. Recent examples have demonstrated multiplex co-sequencing using the four-color dye terminator reaction chemistry pioneered by Prober et al. (Science 238: 336 [1987]). At the 10th International Genome Sequencing and Analysis Conference, (Sep. 17-20, 1998, Miami Beach, Fla.), Uhlen (Royal Institute of Technology) and Chiesa (PE Biosystems) independently showed that biotinylated oligomers could be used to specifically capture an individual sequencing reaction from a pool of multiple reactions in a single tube.
Numerous vectors are available for cloning DNA into E. coli. Conventional plasmid vectors are normally double stranded circular DNA molecules containing restriction enzyme recognition sites suitable for inserting exogenous DNA sequences, an antibiotic selectable gene, an origin of replication for autonomous propagation in the host cell, and a gene for the discrimination or selection of clones that contain recombinant insert DNA.
One of the first recombinant DNA cloning systems used a dual antibiotic resistant plasmid such as pBR322 (Bolivar et al., Gene 2:95 [1977]). One of the resistance genes served to select for those cells taking up plasmid DNA. This gene was typically the beta-lactamase gene (Amp or ampR), which confers resistance to ampicillin (amp). The other resistance gene, Tet or tetR, encoding resistance to tetracycline (tet), was used indirectly as the indicator for recombinant clones. The foreign DNA fragment was inserted into any of a number of restriction sites within the Tet gene, resulting in inactivation of the Tet gene and sensitivity of the transformed cell to killing by tetracycline.
Thus, to find those clones that might have contained foreign insert DNA, the transformed cells were first spread onto ampicillin-containing plates. Those colonies that grew were replica plated onto tetracycline-containing plates. The colonies growing on the ampicillin but not on the tetracycline plates were likely candidates for further analysis. This screening method required additional labor and time compared to newer methods and is rarely used now.
The predominant cloning system in use for the last two decades is the xe2x80x9cblue screenxe2x80x9d method. Blue screen vectors contain a selectable marker such as the ampicillin resistance gene described above. However, the tetracycline screen is replaced by a color discrimination technique based on insertional inactivation of a genetically engineered gene that encodes beta galactosidase (xcex2Gal). The bacteriophage M13mp series and plasmid pUC series of cloning vehicles are ubiquitous examples of this screening method. These vectors encode the N-terminal 60 amino acids of the xcex2Gal gene, the so-called lacZxcex1 peptide, which is inactive as such. Another inactive, truncated portion of lacZ (the lacZAM15 allele) is carried on an Fxe2x80x2 episome of the host bacteria, which can complement the lacZxcex1 peptide to restore xcex2Gal activity. Cells containing non-recombinant vectors therefore produce functional xcex2Gal, which can hydrolyze the indicator chemical XGAL (5-bromo-4-chloro-3-indolyl-beta-galactoside) to produce a blue colored product.
The lacZxcex1 fragment in the vector also contains a series of cloning sites, termed the multiple cloning site, situated such that insertion of foreign DNA into any one site disrupts the lacZxcex1 peptide. An insertion into a site generally, but not always, inactivates the lacZxcex1 fragment. Thus, cells containing an insert in the vector generally do not produce active xcex2Gal. These recombinant clones therefore remain white.
The advantage of the blue screen is that it is a visual assay to discriminate recombinant clones from non-recombinants. However, there are a number of disadvantages to this cloning strategy. One disadvantage is that the substrate XGAL is expensive, unstable, and awkward to use. Another chemical compound, IPTG (isopropyl-xcex2-D-thiogalactoside), a gratuitous inducer of the lac promoter that drives lacZxcex1 in these vectors, is also often required for this cloning system. Another disadvantage is that the high percentage of non-recombinant (blue) colonies compete for nutrients and space with the desired recombinant colonies. A need exists for cloning systems that eliminate the requirement of exogenous chemical additives for screening.
A more significant problem with blue screen cloning technology is the issue of false negative and false positive results, as well as results that cannot be easily classified (Slilaty et al., Gene 213:83 [1998]). False positive results are colonies or plaques that appear white or uncolored, yet do not contain a foreign DNA insert in the lacZxcex1 cloning vectors. Among the external factors responsible for generating false positives are: (1) contamination of the restriction or modifying enzymes used to process the vector (e.g., exonucleases that remove bases from the termini of the lacZxcex1 fragment, creating frame-shifts that inactivate the fragment), (2) spontaneous mutations in the lacZxcex1 fragment or in the lacZxcex94M15 allele, and (3) loss of the Fxe2x80x2 episome carrying the lacZxcex94M15 allele. False positive results are carried forward and analyzed as real positive clones, eventually being detected as empty, deleted, or otherwise mutated vector DNA when further analyzed.
False negative results are blue colonies or plaques that actually do contain foreign DNA inserted in the lacZxcex1 based vector. There are two principle causes of false negative results using blue screen vectors: (1) in-frame insertion of DNA fragments containing one or more open reading frames, and (2) reinitiation of translation within the mRNA transcribed from the inserted DNA fragment. Either event results in the synthesis of the lacZ xcex1-peptide fused to a foreign peptide, which often does not impair its activity. Because the fusion peptide restores xcex2Gal activity, these clones produce the blue color and are erroneously discarded as non-recombinants.
Another problem is the hypersensitivity of the XGAL assay system. Because very little beta-galactosidase activity is required to produce a color reaction, inserts in blue screen vectors often result in xe2x80x9clight bluexe2x80x9d and xe2x80x9cdark whitexe2x80x9d colony phenotypes that complicate the interpretation of cloning results. These blue false negatives are rarely carried forward for analysis and can lead to erroneous conclusions that the DNA fragments they carry are xe2x80x9cnon-clonable.xe2x80x9d This bias against certain sequences malt lead to excessive gaps in shotgun DNA sequencing results as well. Thus, a need exists for cloning systems that do not rely on the blue screen technology.
A cloning procedure that selectively eliminates the background of parental non-recombinant vector would be advantageous in any DNA library construction or sub-cloning experiment. It would also eliminate the screening process, as well as the need to buy, weigh, and mix the required screening chemicals. Various cloning vectors permitting direct selection of recombinant clones have been described in the scientific literature.
Most positive selection vectors (or xe2x80x9csuicidexe2x80x9d vectors) are based on the insertional inactivation of a lethal gene product (Henrich and Plapp, Gene 42, 345 [1986]). Insertion of a foreign DNA fragment disrupts the lethal gene, allowing recombinant cells to grow. Bacterial clones that carry a parental vector do not survive, resulting in selection for clones that carry foreign DNA fragments. The use of suicide vectors for positive selection is an efficient strategy to suppress an undesired background of non-recombinant clones that do not carry the desired DNA insert.
Other examples of positive selection are based on abolition of a particular sensitivity towards metabolites, selection by means of DNA-degrading or RNA-degrading enzymes, or selection by means of unstable long palindromic DNA sequences. Several problems can arise when using the available direct selection cloning vectors. One problem is a high number of false positive clones, i.e., viable clones without an insert. False positives may arise from mutations in the selection genes or their controlling genetic elements (so called revertants), or by inadequate expression of the toxic gene using an inducible genetic system (Bernhard et al., Gene, 148: 71 [1994]). False positive clones are typically carried forward as real positives and are only detected as false positives after analysis of their sequence. Thus, a need exists for a positive selection cloning system that minimizes the number of false positive clones.
Another problem with available direct selection vectors is a high number of false negative clones, i.e., clones with inserts that do not grow or grow very slowly. Similar to the situation described above for blue screen method, certain DNA fragments may not completely inactivate the function of the toxic gene product, which can result in a functionally diminished but nevertheless toxic protein. In other cases, insertion of a particular DNA fragment may not in any way adversely affect the lethal properties of the selection gene. Thus, no clones with the desired insert are obtained. This may occur in particular with small DNA fragments or/and those fragments whose nucleotide sequence is in frame with the selection gene. False negative clones are rarely detected, because they cannot grow on the plating media. Thus, a need exists for a direct selection cloning system that minimizes the number of false negative clones.
Yet another disadvantage of direct selection vectors is that, as in the blue screen vectors, the vector contains a promoter that actively transcribes the region into which the insert DNA is to be cloned. Therefore, insert DNA that encodes toxic or deleterious peptides or proteins will be harmful to the bacterial host cell in which it is carried. Thus, a need exists for a low-background vector that does not transcribe the inserted DNA fragment.
A further disadvantage in some positive selection schemes is the need to make up complex nutrient media to utilize the selection mechanism. Thus, a need exists for direct selection cloning systems that do not require the use of exogenous chemical compounds.
Despite the rapid evolution of sequencing, it is nonetheless still constrained by the significant effort needed to generate libraries of DNA templates, identify recombinant clones, and purify the DNA from those clones. The process of constructing a random clone library is technically challenging, inefficient, and involves numerous steps. The present paradigm for shotgun cloning requires one cloning reaction to generate a library of several thousand templates, each template containing 1 or 2 primer extension sites, which are anchor sequences for the enzymatic method of dideoxy sequencing typically used today. Once a library is made, a vast number of DNA templates must be grown, purified, and sequenced to deduce the sequence of a large genome. For the human genome project, two approaches were used to determine this genetic blueprint. One method was the whole genome shotgun cloning approach used by Celera Corporation. A few shotgun libraries were constructed, but tens of millions of random clones were sequenced using this approach. The other approach, used by the NIH consortia, was to create an ordered array of cosmid, BAC and P1 clone libraries, with average clone sizes of 40-100 kb. An arrayed library covering the entire genome requires approximately 100,000 cosmid clones or 40,000 BAC or P1 clones, assuming a 20% clone overlap. Thus, a minimum of 40,000 to 100,000 shotgun libraries are required to sequence the human genome with this approach. Assuming 400 templates are needed to sequence a 40 kb cosmid clone, or 1000 templates per 100 kb BAC or P1 clone, approximately 40 million templates will be grown, purified, and sequenced. An alternative strategy using large insert BAC clones (150 kb average inserts) and minimal overlap predicts that 20,000 BAC clones will be sufficient to sequence the genome. If 1500 templates are needed to sequence each of these large insert BAC clones, then a minimum of 30 million templates will be grown, purified, and sequenced. Additional genome projects and failed reactions can be expected to double or triple the number of libraries, as well as templates, required for this undertaking. Such high-throughput demands of large-scale sequencing necessitate improvements that will minimize rate-limiting steps. The growth, purification, and sequencing of tens of millions of templates are significant rate-limiting steps in the sequencing of any large genome. What is needed are methods, compositions and systems for cloning and sequencing insert DNA sequences that are faster, more economical, produce very low levels of non-recombinant vector background, and exhibit less discrimination against fragments containing promoter-like sequences or open reading frames.
The present invention relates to systems, methods, and compositions for cloning and sequencing insert nucleic acid sequences. In particular, the present invention provides vectors and vector components configured for multiplex cloning and multiplex sequencing. The present invention also provides vectors and vector components configured to reduce or minimize transcription into and out of insert sequences.
In some embodiments, a circular vector (e.g. recombinant plasmid) is formed from at least two vector components containing selectable marker sequences. In particular embodiments, this vector (e.g. recombinant plasmid) is formed from at least two vector components containing selectable marker sequences and at least two insert DNA sequences. The formation of a vector (e.g. recombinant plasmid) may occur, for example, in a single ligation reaction (e.g. the two vector components and insert sequences, all separate, are joined together in a single ligation reaction). In some embodiments, the compositions of the present invention permit multiplex sequencing (e.g. from a single vector constructed from at least two vector components and at least two insert sequences). In preferred embodiments, the source nucleic acid used to form the vectors of the present invention are at least two separate source nucleic acid molecules (e.g. neither of which has all of the selectable markers contained in the final vector that is formed).
In some embodiments, the present invention provides systems, kits, and compositions for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying X+1 vector components, wherein the vector components are configured for combining in the presence of X+1 insert sequences to form a closed circular recombinant vector (e.g. recombinant plasmid). In certain embodiments, the present invention provides systems, kits, and compositions for cloning nucleic acid comprising at least two different source nucleic acid molecules capable of supplying X+1 vector components, wherein the vector components are configured for combining in the presence of Xxe2x88x921 insert sequences to form a closed circular recombinant vector (e.g. recombinant plasmid). In particular embodiments, the present invention provides systems, kits, and compositions for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying X+1 vector components, wherein the X+1 vector components are configured for combining in the presence of X+1 insert sequences to form a circular vector (e.g. recombinant plasmid).
In some embodiments, the present invention provides systems, kits, and compositions for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying X+1 vector components, wherein the X+1 vector components are configured for combining in the presence of Xxe2x88x921 insert sequences to form a circular vector (e.g. recombinant plasmid), and wherein the vector components are non-contiguous within the circular vector. In some embodiments, X is a positive integer (e.g. 1-50). In particular embodiments, X is selected from 1, 2, 3, 4, 5, and 6.
In other embodiments, the present invention provides systems, kits, and compositions for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying two vector components, wherein the vector components are configured for combining in the presence of two insert sequences to form a circular vector (e.g. recombinant plasmid), and wherein the vector components are non-contiguous with the circular vector. In some embodiments, the present invention provides systems, kits, and compositions for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying three vector components, wherein the vector components are configured for combining in the presence of three insert sequences to form a circular vector (e.g. recombinant plasmid), and wherein the vector components are non-contiguous with the circular vector.
In some embodiments, the present invention provides systems, compositions, and kits, comprising at least two separate source nucleic acid molecules configured for supplying X+1 vector components, wherein the X+1 vector components are configured for combining in the presence of X+1 insert sequences to form a circular vector such that the X+1 vector components are non-contiguous within the circular vector. In certain embodiments, the systems, compositions, and kits further comprise the X+1 insert sequences.
In particular embodiments, the present invention provides systems, compositions, and kits comprising X+1 vector components, wherein each of the X+1 vector components are configured for combining in the presence of X+1 insert sequences to form a circular vector such that the X+1 vector components are non-contiguous within the circular vector. In certain embodiments, the systems, compositions, and kits further comprise the X+1 insert sequences.
In certain embodiments, the present invention provides compositions, kits, and systems for fixed orientation cloning. In certain embodiments, vector components with selectable marker sequences (e.g. all the same selectable marker sequences, or different selectable marker sequences) are utilized for fixed orientation cloning. In other embodiments, vector components without selectable marker sequences are utilized for fixed orientation cloning. In further embodiments, some vector components with selectable marker sequences and some vector components without selectable marker sequences are utilized for fixed orientation cloning. In some embodiments, the present invention provides kits, systems, and compositions for fixed orientation cloning comprising X+1 vector components, wherein each of the X+1 vector components comprises two different sticky free ends and are configured for combining in the presence of X+1 insert sequences to form a circular recombinant vector, wherein each of the X+1 insert sequences comprise two identical sticky free ends that are unique among the X+1 insert sequences. In preferred embodiments, each of the two different sticky free ends (of the vector components) binds one of the X+1 insert sequences. In other preferred embodiments, the X+1 vector components are non-contiguous within the circular recombinant vector.
In certain embodiments, each of the X+1 vector components comprises; i) first and second free ends, and ii) a selectable marker region comprising at least one selectable marker sequence unique among the X+1 vector components. In particular embodiments, each of the X+1 vector components further comprises; iii) a first transcriptional terminator between the first free end and the selectable marker region, and iv) a second transcriptional terminator between the second free end and the selectable marker region. In some embodiments, the first transcriptional terminator is configured to terminate RNA transcripts entering the selectable marker region from the first free end. In other embodiments, the second transcriptional terminator is configured to terminate RNA transcripts entering the selectable marker region from the second free end.
In some embodiments, each of the X+1 vector components comprises a non-promoter sequence between the first free end and the selectable marker region, wherein the non-promoter sequence is unable to serve as an operable promoter in a bacterial host cell. In preferred embodiments, the bacterial host cell is Escherichia coli. In other embodiments, each of the X+1 vector components comprises a non-promoter sequence between the second free end and the selectable marker region, wherein the non-promoter sequence is unable to serve as an operable promoter in a bacterial host cell. In preferred embodiments, the bacterial host cell is Escherichia coli. In certain embodiments, there is a selectable marker after the selectable marker region.
In certain embodiments, one of the X+1 vector components comprises SEQ ID NO:85 or a sequence that is at least 90% identical to SEQ ID NO:85 (e.g. at least 95% or at least 98% identical to SEQ ID NO:85). In some embodiments, one of the X+1 vector components comprises SEQ ID NO:86 or a sequence that is at least 90% identical to SEQ ID NO:86 (e.g. at least 95% or at least 98% identical to SEQ ID NO:86). In preferred embodiments, at least one of the X+1 insert sequence is a lethal or toxic sequence (e.g. will not allow the host cell to form a colony if the insert sequence is transcribed).
In some embodiments, the first and second free ends are configured such that they will not bind to each other. In certain embodiments, the first and second free ends comprise 5xe2x80x2 ends lacking terminal phosphate groups. In other embodiments, the first and second free ends are blunt free ends or sticky free ends. In particular embodiments, at least one of the X+1 insert sequences is of unknown sequence. In preferred embodiments, each of the X+1 vector components comprises two primer binding sites (e.g. such that the circular vector formed has a pair of primer binding sites for sequencing each of the X+1 insert sequences). In certain embodiments, the circular vector is a low copy number circular vector (e.g. contains a gene causing a low copy number or an origin of replication causing a low copy number). In other embodiments, the low copy number circular vector is configured such than no more that 200 copies are produced in a host cell (e.g. no more than 100 or no more than 20 copies per host cell).
In some embodiments, the present invention provides fixed orientation cloning. In particular embodiments, each of the X+1 insert sequences comprise two identical sticky free ends that are unique among the X+1 insert sequences, wherein each of the X+1 vector components comprises two different sticky free ends, and wherein each of the two different sticky free ends binds one of the X+1 insert sequences.
In other embodiments, at least one of the X+1 vector components comprises an ampicillin resistance gene and an Origin of replication. In some embodiments, the ampicillin resistance sequence is a mutated ampicillin resistance sequence configured to reduce feeder colonies. In some embodiments, the mutated ampicillin resistance gene (e.g. derived from pUC19) comprises at least one mutation selected from: T to A at position 174; T to C at position 333; A to G at position 412, C to T at position 648; T to C at position 668; T to C at position 764; and combinations thereof. In preferred embodiments, the circular vector is a recombinant plasmid. In other embodiments, the promoter of the ampicillin resistance gene is replaced by a less active promoter (e.g. CamR promoter).
In certain embodiments, each of the source nucleic acid molecules is configured to supply no more than X of the X+1 vector components. In some embodiments, at least one of the source nucleic acid molecules comprises at least one of the X+1 vector components. In particular embodiments, at least one of the source nucleic acid molecules comprises a template for generating at least one of the X+1 vector components.
In some embodiments, the present invention provides kits comprising at least two separate source nucleic acid molecules configured for supplying X+1 vector components, and one other component (e.g., buffer, product insert, sequencing primers, ligase, etc.). In other embodiments, the present invention provides kits comprising X+1 vector components, wherein the X+1 vector components are configured for combining in the presence of X+1 insert sequences to form a circular vector such that the X+1 vector components are non-contiguous within the circular vector, and one other component (e.g., buffer, product insert, sequencing primers, ligase, etc.). In additional embodiments, the kits further comprise an insert DNA end repair kit (e.g. comprising a polymerase and kinase). In certain embodiments, the kits of the present invention further comprise a written insert component (e.g. comprising written instructions for using the kit).
In certain embodiments, the present invention provides compositions comprising a vector component, wherein the vector component comprises: i) first and second free ends; ii) a selectable marker region, iii) a first transcriptional terminator between the first free end and the selectable marker region, and iv) a second transcriptional terminator between the second free end and the selectable marker region, and wherein the vector component is configured to form a circular vector when combined with an insert sequence. In preferred embodiments, the insert sequence is a lethal or toxic insert sequence (e.g. will not allow the host cell to form a colony if the insert sequence is transcribed). In certain embodiments, the insert sequence has at least 65% A/T content (e.g. at least 65%, 75%, 80%, or 85% A/T content).
In some embodiments, the vector component comprises a non-promoter sequence between the first free end and the selectable marker region, wherein the non-promoter sequence is unable to serve as an operable promoter in a bacterial host cell. In preferred embodiments, the bacterial host cell is Escherichia coli. 
In certain embodiments, the vector component comprises a non-promoter sequence between the second free end and the selectable marker region, wherein the non-promoter sequence is unable to serve as an operable promoter in a bacterial host cell. In preferred embodiments, the bacterial host cell is Escherichia coli. In some embodiments, the first and second free ends comprise 5xe2x80x2 ends lacking terminal phosphate groups. In other embodiments, the first and second free ends are blunt free ends. In certain embodiments, the selectable marker region comprises first and second selectable marker sequences. In some embodiments, the selectable marker region further comprises a transcriptional terminator. In particular embodiments, the transcriptional terminator is between the first and second selectable marker sequences. In other embodiments, the first selectable marker sequence is an Origin of Replication. In certain embodiments, the second selectable marker sequence is an antibiotic resistance gene comprising a promoter sequence and a protein encoding sequence. In preferred embodiments, the promoter sequence is closer to the first or second free ends than the protein encoding sequence (e.g. transcription of the selectable marker sequence proceeds xe2x80x9cawayxe2x80x9d from the free ends).
In certain embodiments, the present invention provides compositions comprising a circular vector, wherein said circular vector comprises: i) a cloning site comprising at least one unique restriction site for insertion of exogenous DNA; ii) a selectable marker region, iii) a transcriptional terminator following the selectable marker region, oriented so as to terminate any RNA transcript initiated from the selectable marker region; iv) a [xe2x80x9c5xe2x80x2-endxe2x80x9d] transcriptional terminator between the cloning site and the 5xe2x80x2 end of the selectable marker region, oriented so as to terminate RNA transcripts entering the 5xe2x80x2 end of said selectable marker region from the cloning site, and v) a [xe2x80x9c3xe2x80x2-endxe2x80x9d] transcriptional terminator between the cloning site and the 3xe2x80x2 end said selectable marker region, oriented so as to terminate RNA transcripts entering the 3xe2x80x2 end of the selectable marker region from the cloning site. In other embodiments, the circular vector is configured such that it may be cleaved to generate a linear fragment. In some embodiments, the circular vector further comprises i) a gene that is toxic when expressed in a host cell, ii) restriction sites that allow excision of the toxic gene, and wherein the circular vector is configured [e.g. by excision of said toxic gene or by PCR amplification to generate a linear fragment. In some embodiments, the present invention provides circular vectors comprising i) a gene that is toxic when expressed in a host cell, and ii) one or more unique restriction sites within the toxic gene, and wherein insertion of exogenous DNA into any of the one or more unique restriction sites is likely to result in disruption of expression of the toxic gene, allowing maintenance of the resulting recombinant vector in host cells.
In some embodiments, the present invention provides compositions comprising a circular vector, wherein the circular vector comprises; i) a toxic gene sequence, and ii) a nucleic acid sequence, wherein the nucleic acid sequence comprises; a) first and second ends, b) a selectable marker region, c) a first transcriptional terminator between the first end and the selectable marker region, and d) a second transcriptional terminator between the second end and the selectable marker region. In certain embodiments, the circular vector is configured to generate a vector component having first and second free ends upon removal of the toxic gene sequence from the circular vector. In other embodiments, the 3. The first transcriptional terminator is configured to terminate RNA transcripts entering the selectable marker region from the first end. In particular embodiments, the second transcriptional terminator is configured to terminate RNA transcripts entering the selectable marker region from the second end.
In some embodiments, the selectable marker region comprises a transcriptional terminator configured to terminate RNA transcripts encoded by at least one selectable marker sequence in the selectable marker region. In other embodiments, the nucleic acid sequence comprises a first non-promoter sequence between the first end and the selectable marker region, and a second non-promoter sequence between the second end and the selectable marker region, wherein each of the first and second non-promoter sequences are unable to serve as an operable promoter in a host cell. In preferred embodiments, the host cell is Escherichia coli. 
In certain embodiments, the selectable marker region comprises first and second selectable marker sequences. In other embodiments, the selectable marker region further comprises a transcriptional terminator configured to terminate transcription of at least one of the first and second selectable marker sequences. In further embodiments, the nucleic acid sequence further comprises two primer binding sites. In some embodiments, expression of the toxic gene sequence prevents growth of a host cell. In particular embodiments, the circular vector further comprises a cloning site positioned such that introduction of an insert sequence into the cloning site diminishes or prevents expression of the toxic gene sequence. In other embodiments, the nucleic acid sequence comprises a promoter sequence between the first or second end and the selectable marker region.
In some embodiments, the selectable marker region comprises an ampicillin resistance sequence. In preferred embodiments, the ampicillin resistance sequence is a mutated ampicillin resistance sequence configured to reduce feeder colonies. In some embodiments, the mutated ampicillin resistance gene (e.g. derived from pUC19) comprises at least one mutation selected from: T to A at position 174; T to C at position 333; A to G at position 412, C to T at position 648; T to C at position 668; T to C at position 764; and combinations thereof. In certain embodiments, the natural promoter of the ampicillin resistance gene is replaced with a weaker promoter.
In certain embodiments, the circular vector is a recombinant plasmid. In preferred embodiments, the circular vector is low copy number vector (e.g. produces less than 300, or less than 200, or less than 100 or less than 50 or less than 20 copies per cell). In some embodiments, the vector component further comprises two primer binding sites. In preferred embodiments, the vector component comprises SEQ ID NO:85 or a sequence that is at least 90% identical to SEQ ID NO:85 (e.g. at least 95% or at least 98% identical to SEQ ID NO:85).
In some embodiments, the present invention provides kits comprising; a) a vector component, wherein the vector component comprises: i) first and second free ends; ii) a selectable marker region, iii) a first transcriptional terminator between the first free end and the selectable marker region, and iv) a second transcriptional terminator between the second free end and the selectable marker region, and wherein the vector component is configured to form a circular vector when combined with an insert sequence; and b) one other component (e.g., buffer, product insert, sequencing primers, ligase, etc.). In certain embodiments, there is a transcriptional terminator after the selectable marker region. In additional embodiments, the kits further comprise an insert DNA end repair component (e.g. comprising a polymerase and kinase). In certain embodiments, the kits of the present invention further comprise a written insert component (e.g. comprising written instructions). In certain embodiments, the selectable marker region comprises at least one selectable marker sequence.
In certain embodiments, the vector components of the present invention comprise at least one selectable marker sequence selected from an ampicillin selectable marker, a chloramphenicol selectable marker, a kanamycin selectable marker, a gentamycin selectable marker, and a plasmid origin of replication (e.g. serving as a selectable marker). In certain embodiments, the vector components comprise at least one transcriptional terminator. In some embodiments, the vector component comprise at least two, or at least three, transcriptional terminators (e.g. flanking a selectable marker). In certain embodiments, each selectable marker, including Ori as a selectable marker, is flanked by transcriptional terminators (e.g. strong transcriptional terminators). In particular embodiments, each of the X+1 vector components comprises at least one transcriptional terminator that is downstream of the selectable marker sequence (i.e. the transcriptional terminator is 3xe2x80x2 of the stop codon in the selectable marker sequence, see Amp selectable marker sequence in FIG. 12B). In other embodiments, at least one of the X+1 vector components comprises first and second transcriptional terminators, wherein the first transcriptional terminator is downstream of a selectable marker sequence, and wherein the second transcriptional terminator is upstream of a selectable marker sequence (i.e. 5xe2x80x2 of the start codon of the selectable marker sequence oriented to terminate transcripts entering the selectable marker sequence).
In particular embodiments of the present invention, at least one of the vector components comprises at least a portion of one of the at least two separate source nucleic acid molecules. In other embodiments, at least one of the vector components is amplified (e.g. using PCR) from at least a portion of one of the at least two separate source nucleic acid molecules (e.g. one of the separate source nucleic acid molecules is exposed to primers that amplify at least a portion of the sequence of the source nucleic acid molecule). In preferred embodiments, the vector components are linear (e.g. the vector components have ends that are not connected to each other). In other preferred embodiments, each of the vector components comprises at least two primer binding sites (e.g. to allow insert DNA adjacent to the vector components to be sequenced).
In some embodiments, the present invention provides systems, kits, and compositions for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying X+1 vector components, wherein each of the source nucleic acid molecules is configured to supply no more than X of the vector components, and wherein the vector components are configured for combining in the presence of X+1 insert sequences to form a circular vector (e.g. recombinant plasmid) such that the X+1 vector components are non-contiguous within the circular vector. In particular embodiments, at least one of the at least two separate source nucleic acid molecules is a replicable vector (e.g. a vector that has an origin of replication and is therefore capable of being copied by a host cell). In some embodiments, the replicable vector is selected from a plasmid, a BAC, a cosmid, or a viral vector (e.g. bacteriophage).
In some embodiments, at least one of the at least two separate source nucleic acid molecules is a direct selection vector (e.g. a vector with a lethal gene that has a cloning site in it). In other embodiments, at least one of the at least two separate source nucleic acid molecules is a conditional replication vector. In particular embodiments, at least one of the source nucleic acid molecules comprises at least one of the vector components. In certain embodiments, at least one of the source nucleic acid molecules is a vector component. In other embodiments, all of the source nucleic acid molecules are vector components. In certain embodiments, at least one of the source nucleic acid molecules comprises a template for generating at least one of the vector components (e.g., by amplification of the template by PCR).
In certain embodiments, the vector components are linear with free 5xe2x80x2 and 3xe2x80x2 ends (e.g. in a double stranded vector component, both 5xe2x80x2 ends and both 3xe2x80x2 ends are not linked to other nucleic acid sequences). In some embodiments, each of the vector components comprises free ends not compatible with the free ends of the other vector components (e.g. the 5xe2x80x2 end of the vector components are not able to bind to either 3xe2x80x2 end of another vector component, or to their own 3xe2x80x2 end). In preferred embodiments, the free 5xe2x80x2 ends of the vector components lack terminal phosphate groups. In some embodiments, the ends of the vector components comprise blunt free ends.
In some embodiments, at least one of the insert sequences is of unknown sequence. In particular embodiments, each of the insert sequences is of unknown sequence. In preferred embodiments, at least one of the X+1 insert sequence is a lethal or toxic insert sequence (e.g. will not allow the host cell to form a colony if the insert sequence is transcribed, which may be determined by also cloning the insert sequence in a conventional vector, such as pUC19, to see if the insert sequence when transcribed is toxic or lethal). In certain embodiments, the circular vector is capable of being maintained by a host cell when the insert sequence has at least 65% A/T content (e.g. at least 65%, 75%., 80%, or 85% A/T content). In particular embodiments, the sequence of at least one of the insert sequences is known. In particular embodiments, the sequence of at least two insert sequences is known. In certain embodiments, at least a portion of the sequence of at least one of the insert sequences is known (e.g. 5, 10, 15, 20, 25 bases are known). In other embodiments, the sequence of at least one of the insert sequences in unknown. In particular embodiments, the sequence of at least two of the X+1 insert sequences is the same (e.g. the circular vector formed has at least two insert sequences that have the same sequence). In some embodiments, each of the insert sequences is at least 20 base pairs in length. In other embodiments, each of the insert sequences is at least 100 base pairs in length. In yet other embodiments, each of the insert sequences is at least 50, or at least 200, or at least 500, or at least 750, or at least 1000 base pairs in length. In other embodiments, the insert sequences are from a shotgun cloning library. In other embodiments, the insert sequences are greater than 1000 base pairs in length (e.g. between 1001 and 7000). In some embodiments, the insert sequences are between 2000 and 6000 base pairs in length. In further embodiments, the insert sequences are greater than 7000 base pairs in length. In particular embodiments, the insert sequences are identical (e.g. all of the X+1 insert sequences have the same sequence).
In certain embodiments, each of the insert sequences is linear (e.g. its ends are not ligated to each other to form a closed loop). In particular embodiments, each of the insert sequences is double stranded. In some embodiments, each of the insert sequences is configured to bind two of the vector components. In certain embodiments, each of the insert sequences is capable of binding to: i) one of the vector components and, ii) one other of the insert sequences. In particular embodiments, at least one of the at least X+1 insert sequences comprises a DNA library. In particular embodiments, none of the at least X+1 insert sequences comprises a DNA library. In other embodiments, the insert sequences comprise DNA. In particular embodiments, the insert sequences comprise RNA.
In some embodiments, the termini of the vector components are configured to provide fixed orientation multiplex cloning vectors, in which the vector components can assemble only in a fixed orientation relative to each other upon ligation to insert DNA fragments. For example, in some embodiments, each of the X+1 insert sequences i) is configured to bind only two of the X+1 vector components, but not to itself or to any other insert sequence, and ii) is combined with X+1 vector components, each of the vector components being configured to bind only two of the X+1 insert sequences, but not to itself or to any other vector component (e.g. the 5xe2x80x2 end of the vector component is not able to bind to the 3xe2x80x2 end of another vector component or to its own 3xe2x80x2 end; see FIG. 16). As such, the vector components can be assembled by ligation to the insert DNAs only in a fixed orientation relative to each other. This arrangement allows for xe2x80x9cpaired-endxe2x80x9d sequencing, in which the ends of a given insert fragment are adjacent to a defined pair of sequencing primers. The vector components may be configured such that specific desired ends are generated by restriction digestion, by PCR amplification, or by ligation of oligonucleotide linkers. Specific desired ends of the insert DNAs may be generated by ligating oligonucleotide linkers onto each of X+1 pools of insert DNAs. In addition to providing fixed orientation of the vector fragments, this method of multiplex cloning eliminates the possibility of cloning multiple insert fragments into a single cloning site.
In some embodiments, the present invention provides kits for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying X+1 vector components, wherein the vector components are configured for combining in the presence of X+1 insert sequences to form a closed vector (e.g. recombinant vector). In particular embodiments, the present invention provides kits for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying X+1 vector components, wherein the X+1 vector components are configured for combining in the presence of X+1 insert sequences to form a circular vector (e.g. recombinant plasmid).
In some embodiments, the present invention provides kits for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying X+1 vector components, wherein the vector components are configured for combining in the presence of X+1 insert sequences to form a circular vector, and wherein the vector components are non-contiguous within the circular vector. In some embodiments, X is a positive integer (e.g. 1-100). In particular embodiments, X is selected from 1, 2, 3, 4, 5, and 6. In other embodiments, the present invention provides kits for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying at least two vector components, wherein the two vector components are configured for combining in the presence of two insert sequences to form a circular vector, and wherein the two vector components are non-contiguous with the circular vector. In some embodiments, the present invention provides kits for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying at least three vector components, wherein the three vector components are configured for combining in the presence of at least three insert sequences to form a circular vector, and wherein the vector components are non-contiguous within the circular vector.
In some embodiments, the present invention provides compositions for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying X+1 vector components, wherein the X+1 vector components are configured for combining in the presence of X+1 insert sequences to form a closed vector (e.g. recombinant plasmid). In particular embodiments, the present invention provides compositions for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying X+1 vector components, wherein the vector components are configured for combining in the presence of X+1 insert sequences to form a circular vector (e.g. recombinant plasmid).
In some embodiments, the present invention provides compositions for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying X+1 vector components, wherein the vector components are configured for combining in the presence of X+1 insert sequences to form a circular vector, and wherein the vector components are non-contiguous within the circular vector. In some embodiments, X is a positive integer. In particular embodiments, X is selected from 1, 2, 3, 4, 5, and 6. In other embodiments, the present invention provides compositions for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying at least two vector components, wherein the vector components are configured for combining in the presence of at least two insert sequences to form a circular recombinant plasmid, and wherein the vector components are non-contiguous with the circular recombinant plasmid. In some embodiments, the present invention provides compositions for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying at least three vector components, wherein the vector components are configured for combining in the presence of at least three insert sequences to form a circular recombinant plasmid, and wherein the vector components are non-contiguous with the circular recombinant plasmid.
In some embodiments, the present invention provides compositions comprising a vector, wherein the vector comprises; i) X+1 vector components, and ii) X+1 insert sequences; and wherein the vector components are non-contiguous within the recombinant plasmid. In particular embodiments, the vector is a circular vector. In other embodiments, the vector is a linear vector. In certain embodiments, the vector components are derived from at least two separate source nucleic acid molecules. In certain embodiments, the vector components of the present invention comprise at least one selectable marker sequence. In other embodiments, the vector components comprise at least two selectable marker sequences. In preferred embodiments, the vector components comprises at least one unique selectable marker sequence (e.g. each vector component has at least one selectable marker sequence not found on the other vector components that make up the circular vector). In certain embodiments, the vector components comprise at least one selectable marker sequence selected from an ampicillin selectable marker, a chloramphenicol selectable marker, a kanamycin selectable marker, a gentamycin selectable marker, and a plasmid origin of replication (e.g. serving as a selectable marker).
In particular embodiments of the compositions of the present invention, at least one of the vector components comprise at least a portion of one of the at least two separate source nucleic acid molecules. In other embodiments, at least one of the vector components is amplified (e.g. by PCR) from at least a portion of one of the at least two separate source nucleic acid molecules (e.g. one of the separate source nucleic acid molecules is exposed to primers that amplify at least a portion of the sequence of the source nucleic acid molecule). In preferred embodiments, the vector components are linear (e.g. they have ends that are not connected to each other). In other preferred embodiments, the vector components comprise at least two primer binding sites (e.g. to allow insert DNA adjacent to the vector components to be sequenced).
In some embodiments, the present invention provides compositions for cloning nucleic acid comprising at least two separate source nucleic acid molecules capable of supplying X+1 vector components, wherein each of the source nucleic acid molecules is configured to supply no more than X of the vector components, and wherein the vector components are configured for combining in the presence of X+1 insert sequences to form a circular vector (e.g. recombinant plasmid), and wherein the vector components are non-contiguous within the circular vector. In particular embodiments, at least one of the at least two separate source nucleic acid molecules is a replicable vector (e.g. a vector that has an origin of replication, and is capable of being copied by a host cell). In some embodiments, the replicable vector is selected from a plasmid, a BAC, a cosmid, a viral vector (e.g. bacteriophage).
In some embodiments, at least one of the at least two separate source nucleic acid molecules is a direct selection vector (e.g. a vector with a lethal gene that has a cloning site in it). In other embodiments, at least one of the at least two separate source nucleic acid molecules is a conditional replication vector. In particular embodiments, at least one of the source nucleic acid molecules comprises at least one of the vector components. In certain embodiments, at least one of the source nucleic acid molecules comprises a template for generating at least one of the vector components by amplification.
In certain embodiments, the vector components are linear with free 5xe2x80x2 and 3xe2x80x2 ends (e.g. in a double stranded vector component, both 5xe2x80x2 ends and both 3xe2x80x2 ends are not linked to other nucleic acid sequences). In some embodiments, each of the vector components comprises free ends not compatible with the free ends of the other vector components (e.g. the 5xe2x80x2 end of the vector components is not able to bind to either end of another vector component, or to its own 3xe2x80x2 end). In preferred embodiments, the free ends of the vector components lack terminal 5xe2x80x2 phosphate groups.
In some embodiments, at least one of the insert sequences is of unknown sequence. In particular embodiments, each of the insert sequences is of unknown sequence. In particular embodiments, the sequence of at least one of the insert sequences is known. In particular embodiments, the sequence of at least two of the insert sequences is known. In certain embodiments, at least a portion of the sequence of at least one of the insert sequences in known (e.g. 5, 10, 15, 20, 25 bases are known). In other embodiments, the sequence of at least one of the insert sequences is unknown. In some embodiments, each of the insert sequences is at least 20 base pairs in length. In other embodiments, each of the insert sequences is at least 100 base pairs in length. In yet other embodiments, each of the insert sequences is at least 50, or at least 200, or at least 500, or at least 750, or at least 1000 base pairs in length. In other embodiments, the insert sequences are from a shotgun cloning library. In other embodiments, the insert sequences are between 1000 and 7000 base pairs in length. In some embodiments, the insert sequences are between 7000 and 12000 base pairs in length. In particular embodiments, the insert sequences are identical (e.g. all of the X+1 insert sequences have the same sequence).
In certain embodiments, each of the insert sequences is linear (e.g. its ends are not ligated to each other to form a closed loop). In particular embodiments, each of the insert sequences is double stranded. In some embodiments, each of the insert sequences are configured to bind two of the vector components. In certain embodiments, each of the insert sequences are capable of binding to: i) one of the vector components, and ii) one other of the insert sequences. In particular embodiments, at least one of the X+1 insert sequence comprises a DNA library. In other embodiments, the insert sequences comprise DNA. In particular embodiments, the insert sequences comprise RNA. In some embodiments, the insert sequences comprise ends that are phosphorylated.
In some embodiments, each of the X+1 insert sequences i) is configured to bind two of the vector components, but not to itself or to any other insert sequence, and ii) is combined with X+1 vector components, each of the vector components comprising one free end compatible with one of the insert ends and one free end compatible with another insert end, but not compatible with the free ends of the other vector components (e.g. the 5xe2x80x2 end of the vector components is not able to bind to either 3xe2x80x2 end of another vector component, or to its own 3xe2x80x2 end) (see FIG. 16).
In some embodiments, the present invention provides compositions comprising a circular vector, wherein the circular vector comprises a plurality of cloning sites, each separated by at least one selectable marker sequence. In certain embodiments, the circular vector is a direct selection vector. In other embodiments, the circular vector is a conditional replication vector. In particular embodiments, the plurality of cloning sites comprises at least three cloning sites. In additional embodiments, the plurality of cloning sites comprises at least four (or five, or six, or seven) cloning sites. In some embodiments, at least one selectable marker sequence comprises two selectable marker sequences. In other embodiments, the selectable marker sequences comprises at least two primer binding sites. In particular embodiments, at least one selectable marker sequences selected from ampicillin, chloramphenicol, kanamycin, gentamycin, and a plasmid origin of replication. In some embodiments, the circular vector is a plasmid.
In some embodiments, the present invention provides compositions comprising a circular vector, wherein the circular vector comprises at least two selectable marker sequences, wherein each of the selectable marker sequences is flanked by cloning sites.
In other embodiments, the present invention provides composition comprising a circular vector, wherein the circular vector comprises at least two vector components, wherein each of the vector components comprises at least one selectable marker sequence, and wherein each of the vector components is flanked by cloning sites.
In certain embodiments, the present invention provides methods for cloning nucleic acid comprising: a) providing; i) at least two separate source nucleic acid molecules, and ii) at least X+1 insert sequences and b) treating the at least two separate source nucleic acid molecules under conditions such that at least X+1 vector components are generated; and c) combining the at least X+1 insert sequences with the at least X+1 vector components under conditions such that a circular recombinant vector is generated, wherein the vector components are non-contiguous within the circular vector. In some embodiments, the method further comprises: providing; iii) host cells, and step d) transfecting the host cells with the circular vector (e.g., recombinant plasmid) generating transfected cells. In other embodiments, the method further comprises; providing iv) selective growth media, and step e) treating the transfected cells with the selective media to select cells containing X+1 insert sequences.
In particular embodiments, step c) generates a plurality of circular vectors (e.g. recombinant plasmids), and the method further comprises step f) identifying the cells containing X+1 insert sequences, wherein the identifying is at least 95% accurate (e.g. there is only 5% that is false positives). In preferred embodiments, the identifying is at least 98% accurate. In particularly preferred embodiments, the identifying is at least 99% accurate. In most preferred embodiments, the identifying is approximately 100% accurate (e.g. 99.5% or greater). In certain embodiments, the selective growth media comprises at least X+1 selective agents. In different embodiments, the selective growth media comprises X selective agents (e.g. an origin of replication being employed as a selective marker). In some embodiments, the selective agents are selected from ampicillin, chloramphenicol, kanamycin, and gentamycin.
In some embodiments, the method further comprises providing multiplex sequencing reagents, and step d) mixing the multiplex sequencing reagents with the circular vector (e.g. recombinant plasmid) under conditions such that at least a portion of each of the X+1 insert sequences are sequenced (e.g. at least 5, 10, 15, 20, 25, 100 bases are determined from each of the insert sequences). In preferred embodiments, at least 400, or 500 bases are determined from each of the insert sequences. In particularly preferred embodiments, at least 500 or at least 700 bases are determined from each of the insert sequences. In some embodiments, the multiplex sequencing reagents comprise: i) at least two primers for each of the X+1 insert sequences, ii) a nucleic acid polymerizing agent, and iii) nucleotides, wherein a portion of the nucleotides are di-deoxy nucleotides.
In certain embodiments, the present invention provides methods for cloning nucleic acid comprising: a) providing; i) at least two separate source nucleic acid molecules, and ii) at least X+1 insert sequences, and b) treating the at least two separate source nucleic acid molecules under conditions such that at least X+1 vector components are generated: and c) combining the at least X+1 insert sequences with the at least X+1 vector components under conditions such that a circular vector (e.g. recombinant plasmid) is generated. In certain embodiments, the treating comprises exposing the at least two separate source nucleic acid molecules to restriction enzymes and/or alkaline phosphatase. In other embodiments, the treating comprises employing at least a portion of one of the at least two separate source nucleic acid molecules as a template for PCR.
In certain embodiments, the X+1 vector components of the present invention comprise at least one selectable marker sequence. In some embodiments, the vector components comprise: i) first and second free ends, and ii) a selectable marker region comprising at least one selectable marker sequence unique among the X+1 vector components. In further embodiments, the X+1 vector components further comprise a first transcriptional terminator between the first free end and the selectable marker region. In other embodiments, the X+1 vector components comprise a second transcriptional terminator between the second free end and the selectable marker region. In other embodiments, the vector components comprise at least two selectable marker sequences. In preferred embodiments, the vector components comprises at least one unique selectable marker sequence (e.g. each vector component has at least one selectable marker sequence not found on the other vector components that make up the circular vector). In certain embodiments, the vector components comprise at least one selectable marker sequence selected from an ampicillin selectable marker, a chloramphenicol selectable marker, a kanamycin selectable marker, a gentamycin selectable marker, tetracycline, and a plasmid origin of replication (e.g. serving as a selectable marker). In some embodiments, the selectable marker sequences are antibiotic resistance genes. In certain embodiments, there is a transcriptional terminator after the selectable marker sequence.
In particular embodiments of the methods of the present invention, at least one of the vector components comprise at least a portion of one of the at least two separate source nucleic acid molecules. In other embodiments, at least one of the vector components is PCR generated from at least a portion of one of the at least two separate source nucleic acid molecules (e.g. one of the separate source nucleic acid molecules is exposed to primers that amplify at least a portion of the sequence of the source nucleic acid molecule). In preferred embodiments, the vector components are linear (e.g. the have ends that are not connected to each other). In other preferred embodiments, the vector components comprise at least two primer binding sites (e.g. to allow insert DNA adjacent to the vector components to be sequenced).
In particular embodiments, at least one of the at least two separate source nucleic acid molecules is a replicable vector (e.g. a vector that has an origin of replication, and is capable of being copied by a host cell). In some embodiments, the replicable vector is selected from a plasmid, a BAC, a cosmid, a viral vector (e.g. bacteriophage).
In some embodiments, at least one of the at least two separate source nucleic acid molecules is a direct selection vector (e.g. a vector with a lethal gene that has a cloning site in it). In other embodiments, at least one of the at least two separate source nucleic acid molecules is a conditional replication vector. In particular embodiments, at least one of the source nucleic acid molecules comprises at least one of the vector components. In certain embodiments, at least one of the source nucleic acid molecules comprises a template for generating at least one of the vector components (e.g. by amplification).
In certain embodiments, the vector components are linear with free 5xe2x80x2 and 3xe2x80x2 ends (e.g. in a double stranded vector component, both 5xe2x80x2 ends and both 3xe2x80x2 ends are not linked to other nucleic acid sequences). In some embodiments, each of the vector components comprises free ends not compatible with the free ends of the other vector components (e.g. the 5xe2x80x2 end of the vector components is not able to bind to either end of another vector components, or to its own 3xe2x80x2 end). In preferred embodiments, the free ends of the vector components lack terminal phosphate groups.
In some embodiments, at least one of the insert sequences is of unknown sequence. In particular embodiments, each of the insert sequences is of unknown sequence. In particular embodiments, the sequence of at least one of the insert sequences is known. In particular embodiments, the sequence of both of the insert sequences is known. In certain embodiments, at least a portion of the sequence of at least one of the insert sequences in known (e.g. 5, 10, 15, 20, 25 bases are known). In other embodiments, the sequence of at least one of the insert sequences is unknown. In some embodiments, each of the insert sequences is at least 20 base pairs in length. In other embodiments, each of the insert sequences is at least 100 base pairs in length. In yet other embodiments, each of the insert sequences is at least 50, or at least 200, or at least 500, or at least 750, or at least 1000 base pairs in length. In other embodiments, the insert sequences are from a shotgun cloning library. In other embodiments, the insert sequences are between 1000 and 7000 base pairs in length. In some embodiments, the insert sequences are between 7000 and 12000 base pairs in length. In particular embodiments, the insert sequence are identical (e.g. all of the X+1 insert sequences have the same sequence).
In certain embodiments, each of the insert sequences is linear (e.g. its ends are not ligated to each other to form a closed loop). In particular embodiments, each of the insert sequences is double stranded. In some embodiments, each of the insert sequences is configured to bind two of the vector components. In certain embodiments, each of the insert sequences is capable of binding to: i) one of the vector components and, ii) one other of the insert sequences. In particular embodiments, at least one of the at least X+1 insert sequence comprises a DNA library. In other embodiments, the insert sequences comprise DNA. In particular embodiments, the insert sequences comprise RNA.
In some embodiments, each of the X+1 insert sequences i) is configured to bind two of the vector components, but not to itself or to any other insert sequence, and ii) is combined with X+1 vector components, each of the vector components comprising one free end compatible with one of the insert ends and one free end compatible with another insert end, but not compatible with the free ends of the other vector components (e.g. the 5xe2x80x2 end of the vector components is not able to bind to the 3xe2x80x2 end of another vector components, or to its own 3xe2x80x2 end) (see, e.g., FIG. 16).
In certain embodiments, the present invention provides methods for cloning nucleic acid comprising; providing; i) at least X+1 vector components, and ii) at least X+1 insert sequences; and b) combining the at least X+1 insert sequences with the at least X+1 vector components under conditions such that a circular recombinant plasmid is generated, wherein the vector components are non-contiguous within the circular recombinant plasmid.
In other embodiments, the present invention provides methods for sequencing nucleic acid comprising: a) providing; i) a circular vector comprising; A) X+1 vector components, and B) X+1 insert sequences; and wherein the vector components are non-contiguous within the circular recombinant plasmid, and ii) multiplex sequencing reagents; and b) mixing the multiplex sequencing reagents with the circular vector under conditions such that at least a portion of each of the X+1 insert sequences are sequenced. In some embodiments, the multiplex sequencing reagents comprise: i) at least two primers for each of the X+1 insert sequences, ii) a nucleic acid polymerizing agent, and iii) nucleotides, wherein a portion of the nucleotides are di-deoxy nucleotides.
In certain embodiments, the present invention provides methods comprising combining a plurality of vector components and a plurality of insert sequences under conditions such that a circular recombinant plasmid containing two or more of the insert sequences is formed (in some embodiments the vector components are non-contiguous). In some embodiments, the circular recombinant plasmid contains three or more of the insert sequences. In particular embodiments, the circular recombinant plasmid contains four or more of the insert sequences.
In some embodiments, the present invention provides compositions comprising a direct selection vector, wherein the direct selection vector comprises; i) a plasmid origin of replication, and ii) a bacteriophage T7 1.2 gene sequence (or a sequence encoding a protein identical to the T7 1.2 gene product, or a sequence encoding a protein that has the same biological activity as the T7 1.2 gene, e.g. the amino acid sequence for T7 1.2 with minor deletions, substitutions, or additions, that do not alter the biological activity of the peptide). In particular embodiments, the direct selection vector further comprises at least one selectable marker sequence. In other embodiments, the direct selection vector further comprises a multiple cloning site. In certain embodiments, the multiple cloning site is derived from pUC19. In yet other embodiments, the multiple cloning site is located between the first and second codon of the bacteriophage T7 1.2 gene sequence. In yet other embodiments, the multiple cloning site is located between two other adjacent codons of the bacteriophage T7 1.2 gene sequence. In particular embodiments, the multiple cloning site comprises SEQ ID NO:29. In additional embodiments, the multiple cloning site comprises SEQ ID NO:30. In preferred embodiments, the direct selection vector is pT71.2. In other embodiments, the direct selection vector is pTM2. In some embodiments, the vector generated by the above method is pCTA1. In other embodiments, the vector generated by the above method is pCTAB4.3. In still other embodiments, the vector generated by the above method is pCTH1.4. In other embodiments, the vector generated by the above method is pATH. In other embodiments, the vector generated by the above method is pATBAG. In still other embodiments, the vector generated by the above method is pATR-G. In certain embodiments, the vector generated by the above method is pAT6-6. In other embodiments, the vector generated by the above method is pARG. In certain embodiments, the bacteriophage T7 1.2 gene is lethal in Fxe2x80x2 E. coli cells.
In certain embodiments, the present invention provides methods for generating a vector comprising: a) providing; i) a direct selection vector comprising; A) a plasmid origin of replication, and B) a bacteriophage T7 1.2 gene sequence; ii) a composition comprising at least one type of restriction enzyme; and iii) in certain embodiments a composition comprising a phosphatase (e.g. calf intestinal phosphatase); and b) exposing the direct selection vector to the composition under conditions such that the bacteriophage T7 1.2 gene is removed from the direct selection vector. In some embodiments, the exposing step generates a cloning vector, or vector component, lacking the bacteriophage T7 1.2 gene sequence. In further embodiments, the present invention provides a compositions comprising the vector lacking the bacteriophage T7 1.2 gene, generated by the above method.
In some embodiments, the present invention provides methods for generating a vector component comprising; a) providing; i) a circular vector comprising; A) a selectable marker region, B) a direction selection sequence (e.g. T7 1.2 gene or Barnase), C) a first transcriptional terminator upstream of the direct selection sequence, wherein the first transcriptional terminator is between the selectable marker region and the direct selection sequence, and D) a second transcriptional terminator downstream of the direct selection sequence, wherein the second transcriptional terminator is between the selectable marker region and the direct selection sequence; and ii) a composition comprising at least one type of restriction enzyme; and iii) in certain embodiments a composition comprising a phosphatase (e.g. calf intestinal phosphatase); and b) exposing the circular vector to the composition under conditions such that the direct selection sequence is removed from the circular vector, thereby generating a vector component with first and second free ends (e.g. blunt free ends). In certain embodiments, the method further comprises step c) exposing the vector component to a phosphatase (e.g. calf intestinal phosphatase), such that the free ends are dephosphorylated. In certain embodiments, the selectable marker region comprises at least one selectable marker followed by a transcriptional terminator.
In certain embodiments, the present invention provides methods comprising, a) providing; i) X+1 vector components, and ii) X+1 insert sequences; and b) combining the X+1 vector components and the X+1 insert sequences under conditions such that a circular vector is formed, wherein the X+1 vector components are non-contiguous with the circular vector. In some embodiments, each of the X+1 vector components comprises; i) first and second free ends, and ii) a selectable marker region comprising at least one selectable marker sequence unique among the X+1 vector components. In other embodiments, each of the X+1 vector components further comprises; iii) a first transcriptional terminator between the first free end and the selectable marker region, and iv) a second transcriptional terminator between the second free end and the selectable marker region. In particular embodiments, each of the X+1 vector components comprises a non-promoter sequence between the first free end and the selectable marker region, wherein the non-promoter sequence is unable to serve as an operable promoter in a bacterial host cell (e.g., Escherichia coli). In other embodiments, each of the X+1 vector components comprises a non-promoter sequence between the second free end and the selectable marker region, wherein the non-promoter sequence is unable to serve as an operable promoter in a bacterial host cell. In certain embodiments, the selectable marker region comprises at least one selectable marker followed by a transcriptional terminator.
In some embodiments, the method further comprises; providing iii) host cells, and step c) transfecting the host cells with the circular vector (e.g., recombinant plasmid) generating transfected cells. In other embodiments, the method further comprises; providing iv) selective growth media, and step d) treating the transfected cells with the selective media to select cells containing X+1 insert sequences.
In particular embodiments, step b) generates a plurality of circular vectors (e.g. recombinant plasmids), and the method further comprises step e) identifying the cells containing X+1 insert sequences, wherein the identifying is at least 95% accurate (e.g. there is only 5% or less that are false positives). In preferred embodiments, the identifying is at least 98% accurate. In particularly preferred embodiments, the identifying is at least 99% accurate. In most preferred embodiments, the identifying is approximately 100% accurate (e.g. 99.5% or greater)
In some embodiments, the present invention provides methods comprising, a) providing; i) a vector component, wherein the vector component comprises: A) first and second free ends; B) a selectable marker region, C) a first transcriptional terminator between the first free end and the selectable marker region, and D) a second transcriptional terminator between the second free end and the selectable marker region, and ii) and an insert sequence, and b) combining the vector component and the insert sequence under conditions such that a circular vector is formed. In certain embodiments, the vector component further comprises a non-promoter sequence between the first free end and the selectable marker region, wherein the non-promoter sequence is unable to serve as an operable promoter in a bacterial host cell (e.g. Escherichia coli). In particular embodiments, the vector component comprises a non-promoter sequence between the second free end and the selectable marker region, wherein the non-promoter sequence is unable to serve as an operable promoter in a bacterial host cell (e.g. Escherichia coli). In some embodiments, the vector component comprises a third transcriptional terminator (e.g. after at least one selectable marker sequence).
In some embodiments, the method further comprises; further providing iii) host cells, and step c) transfecting the host cells with the circular vector (e.g., recombinant plasmid) generating transfected cells. In other embodiments, the method further comprises; providing iv) selective growth media, and step d) treating the transfected cells with the selective media to select cells containing X+1 insert sequences.
In particular embodiments, step b) generates a plurality of circular vectors (e.g. recombinant plasmids), and the method further comprises step e) identifying the cells containing X+1 insert sequences, wherein the identifying is at least 95% accurate (e.g. there is only 5% or less that are false positives). In preferred embodiments, the identifying is at least 98% accurate. In particularly preferred embodiments, the identifying is at least 99% accurate. In most preferred embodiments, the identifying is approximately 100% accurate (e.g. 99.5% or greater).
In certain embodiments, the present invention provides methods for fixed orientation cloning comprising; a) providing; i) X+1 vector components, wherein each of the X+1 vector components comprises two different sticky free ends, and ii) X+1 insert sequence pools, wherein each of the X+1 insert sequence pools comprises a plurality of insert sequences, and b) treating each of the X+1 insert sequence pools under conditions such that the plurality of insert sequences in each of the X+1 insert sequence pools comprise two identical sticky free ends that are unique among the X+1 insert sequence pools, and c) combining the X+1 vector components and the X+1 sequence pools under conditions such that each of the two different sticky free ends, of each of the X+1 vector components, binds one of the plurality of insert sequences from one of the X+1 insert sequence pools. In some embodiments, the treating step comprises exposing the plurality of insert sequences in each of the X+1 insert sequence pools to a plurality of one type of linker (e.g. CCCC linkers and ligase are added to one of the pools, and TTTT linkers and ligase are added to a different pool). The present invention is not limited to the length or sequence of the linkers employed. Indeed, any type of linker oligonucleotide may be used. In preferred embodiments, each of the X+1 pools is exposed to a different type of linker. In certain embodiments, the treating step comprises exposing the plurality of insert sequences in each of the X+1 insert sequence pools to a plurality of one type of restriction enzyme (e.g. to generate sticky ends).
In particular embodiments, the present invention provides methods comprising; a) providing; i) X+1 vectors (e.g. circular or linearized), wherein each of the vectors comprises; A) an identical origin of replication (i.e. each of the X+1 vector components comprises the same origin of replication), and B) at least one selectable marker sequence unique among the X+1 vectors, ii) a plurality of insert sequences, and iii) host cells; and b) combining the X+1 vectors and the plurality of insert sequences under conditions such that X+1 recombinant vectors are generated; and c) transforming the host cells with the X+1 recombinant vectors (e.g. transforming the host cells with each of the X+1 vectors at approximately the same time) to generate transformed host cells. In further embodiments, the methods further comprise; providing iv) selective grow th media, and step d) treating the transformed host cells with the selective media to select cells containing X+1 recombinant vectors.
In certain embodiments, the selective growth media comprises at least X+1 selective agents. In different embodiments, the selective growth media comprises X selective agents (e.g. an origin of replication being employed as a selective marker). In some embodiments, the selective agents are selected from ampicillin, chloramphenicol, kanamycin, and gentamycin.
In some embodiments, the present invention provides methods comprising; a) providing; i) X+1 vectors (e.g. circular or linearized), wherein each of the vectors comprises; A) an identical origin of replication (i.e. each of the X+1 vector components comprises the same origin of replication), and B) at least one selectable marker sequence unique among the X+1 vectors, and ii) X+1 insert sequence pools; and b) combining each of the insert sequence pools with one of the X+1 vectors such that X+1 recombinant vector pools comprising recombinant vectors are generated, and c) contacting the host cells with the X+1 recombinant vector pools (e.g. transforming the host cells with each of the X+1 vector pools at approximately the same time) to generate transformed host cells. In further embodiments, the methods further comprise; providing iv) selective growth media, and step d) treating the transformed host cells with the selective media to select cells containing X+1 recombinant vectors.
In certain embodiments, the present invention provides compositions, systems, and kits comprising a circular vector (e.g. plasmid), wherein the circular vector comprises a barnase encoding nucleic acid sequence, and wherein the circular vector does not contain an operable barstar encoding nucleic acid sequence. In some embodiments, the present invention provides cells comprising a circular vector (e.g. plasmid), wherein the circular vector comprises a barnase encoding nucleic acid sequence, and wherein the circular vector does not contain an operable barstar encoding nucleic acid sequence. In other embodiments, the present invention provides cells comprising i) a first circular vector (e.g. plasmid), wherein the first circular vector comprises a barnase encoding nucleic acid sequence, and wherein the first circular vector does not contain an operable barstar encoding nucleic acid sequence, and ii) a second circular vector comprising a barstar encoding nucleic acid sequence.
In certain embodiments, the present invention provides methods comprising; a) providing; i) a plurality of circular vectors (e.g. plasmids), wherein the circular vectors comprise a barnase encoding nucleic acid sequence, and wherein the circular vectors do not contain an operable barstar encoding nucleic acid sequence, ii) host cells that do not contain a nucleic acid sequence encoding barnase, and iii) a plurality of insert sequences; b) combining the plurality of circular vectors and the plurality of insert sequences such that a plurality of recombinant vectors are generated, c) transforming the host cells with the plurality of recombinant vectors to generate a plurality of transformed cells, and d) plating the plurality of transformed cells on selective media such that transformed cells containing recombinant circular vectors with disrupted barnase encoding nucleic acid sequences are identified.
In certain embodiments, the present invention provides compositions comprising X+1 vector components configured for cloning X+1 insert sequences with a false positive background of less than 5%, or less than 2% or less than 1% (e.g. 0.5% false positives). In certain embodiments, the present invention provides compositions comprising a plurality of circular vectors configured to yield at least 98% recombinant clones when grown on selective media (e.g., approximately 99% or 99.50% or greater recombinant clones), wherein at least a portion of the circular vectors comprise at least two insert sequences. In some embodiments, the present invention provides compositions comprising a vector configured to clone at least one insert (e.g. one insert) without transcription of the insert sequence when transformed into a host cell. In other embodiments, the present invention provides compositions comprising a vector configured to clone at least two insert sequences without transcription of the insert sequences when transformed into a host cell.