1. Field of the Invention
The invention relates to techniques for manipulating nucleic acids linked to a vector. The methods described can be used, for example, in screening nucleic acid libraries. The methods can also be adapted to provide kits for screening nucleic acid libraries.
2. Introduction
The invention comprises methods and corresponding kits for enriching the presence of one or more desired nucleic acids from a collection of many nucleic acids. By employing the appropriate vectors and primers, the desired nucleic acids are produced in a replication-competent form that can be introduced into a host cell or organism. Thus, for example, the methods can be used with an appropriate bacteriophage library in order to directly produce, from the DNA of the library, replication-competent plasmids containing a desired nucleic acid. The resulting plasmids can immediately be transformed into appropriate hosts. The methods advance the ability of one skilled in the art to rapidly identify and isolate a desired nucleic acid in a vector. By directly employing the nucleic acids of a library rather than the hosts bearing those nucleic acids, the methods circumvent the steps of plating, growing, and transferring millions of clones in order to screen for a desired nucleic acid sequence. Unlike other methods directly employing the nucleic acids of a library, no physical separation or binding procedures are required. Thus, practice of this invention simplifies the screening of genomic, cDNA, or other nucleic acid libraries compared to currently used methods.
3. Description of Related Art
Nucleic acid libraries consist of a collection of different nucleic acids from a particular source, which possess differing nucleic acid sequences. Each of the nucleic acids, called xe2x80x9cinserts,xe2x80x9d are operably linked to a vector with a particular nucleic acid sequence. The vector allows, inter alia, the nucleic acid inserts to be replicated in an appropriate host.
The nucleic acid molecules that make up a library are typically in circular or linear form. Plasmids are circular nucleic acid molecules that replicate in host organisms using an origin of replication and usually possess a gene that gives the host cell a selective advantage over other cells. Linear vectors, such as the bacteriophage lambda-derived vectors, may contain corresponding elements for replication and selection as well as elements encoding bacteriophage proteins necessary for propagation in bacteria.
Libraries of plasmid vectors typically contain the nucleic acid inserts at a defined location in the plasmid (xe2x80x9ccloning sitexe2x80x9d or xe2x80x9cmultiple cloning sitexe2x80x9d). Linear vectors, such as lambda bacteriophage, will also typically contain the inserts at some fixed cloning site or multiple cloning site regions in the vector. The inserts are flanked by vector sequences of a particular design. For example, those skilled in the art have designed flanking regions to make xe2x80x9cinsertingxe2x80x9d nucleic acids more convenient or easier. A number of vector designs have been developed and are well known in the art.
The libraries are generally constructed in order to facilitate the identification and isolation (cloning) of particular nucleic acid sequences, such as novel genes. Thus, it is often desirable to isolate one or more particular nucleic acids from a library for further study or use.
There are several ways to isolate a desired nucleic acid sequence from a library. Originally, in situ filter hybridization methods were used for this identification. (See, e.g., Sambrook, J., et al., Molecular Cloning, A Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory Press, N.Y. (1989)) However, filter hybridization methods require intensive labor and a significant amount of materials. For example, the libraries are typically grown or plated on an appropriate surface as individual clones and then transferred to a filter membrane. Identification and separation of a desired clone, containing the desired nucleic acid sequence, requires physically locating a positively hybridizing bacterial colony or phage-producing plaque. Each library needs a certain minimum surface area so that an individual colony or plaque can be differentiated from others. As larger surface areas are needed, the number of filters for hybridization also increase. A hypothetical library of 1 million clones typically requires numerous 100 mm diameter filter membranes in order to screen one copy of the entire library. In addition, the filter hybridization screening methods require multiple rounds of colony plating or phage infection, filter preparation, and hybridization steps. Generally, one skilled in the art can screen up to one million clones effectively, but it may take weeks or months to yield the desired clone.
Other procedures have eliminated the time consuming aspects of filter hybridization. Instead, these procedures combine conventional hybridization with chromatographic or magnetic physical separation techniques. There are numerous examples. One method for isolating a particular plasmid from a mixture of plasmids relies upon hybridization of circular double-stranded plasmid DNA to a RecA protein-coated biotinylated probe. A resulting triple-stranded complex may then bind to an agarose-streptavidin column and be physically separated from other plasmids present. (Rigas, B. et al., P.N.A.S. 83: 9591 (1986)) Another modification employs biotinylated homopyrimidine oligonucleotide probes to form complexes that bind to streptavidin-coated magnetic beads. (Ito, T. et al., Nucleic Acids Res. 20: 3524 (1992); Ito, T. et al., P.N.A.S. 89: 495 (1992)) Takabatake et al. describe a variation of this technique that employs a biotinylated purine-rich oligonucleotide probe to bind the desired nucleic acid molecule. (Takabatake, T. et al., Nucleic Acid Res. 20: 5853-5854 (1992)) One drawback with using only homopyrimidine and purine-rich probes is the limitation in the possible nucleic acid sequences for which screening can be done.
Many other methods also employ a physical binding and separation step. Methods for screening libraries using biotinylated probes and magnetic beads are discussed in U.S. Pat. No. 5,500,356. Another method of screening for nucleic acid sequences is described by Kwok, P. Y. et al. This method, which employs PCR-based screening procedures, uses an ELISA-based oligonucleotide-ligation assay (OLA) to detect the PCR products containing the desired sequence. (Kwok, P. Y., et al., Genomics 13: 935-941 (1992)) OLA employs a xe2x80x9creporter probexe2x80x9d and a phosphorylated/biotinylated xe2x80x9canchorxe2x80x9d probe. Streptavidin binding to the biotinylated probe can then separate the desired nucleic acids. (Landegren, U., et al., Science 241:1077-1080 (1988)) Biotin-streptavidin systems also rely on physical binding efficiency and may have the added problems of incomplete biotinylation of the probes used, which results in non-biotinylated probes hybridizing and failing to be separated by the physical technique used, as well as limited accessibility of the biotin on the probe.
In library screening methods, the ability to increase the abundance of a particular nucleic acid relative to all other nucleic acids present in a library is limited by the effectiveness of the physical separation technique used. A general drawback to all of these techniques is their reliance on physical binding and separation steps, which are inefficient and complicated.
A method that employs PCR amplification from cDNA libraries for obtaining additional nucleotide sequence of a desired gene when only a partial sequence is known is discussed in PCT 213 publication WO 96/38591. In that method, a PCR reaction extends primers directed to the cDNA insert sequences on circular plasmids. The extended, double-stranded DNAs from the PCR can be purified and re-ligated to generate the same plasmid with the same insert. Thus, plasmids containing target inserts may be amplified from a cDNA library. However, this method requires circular plasmids. Furthermore, the method also involves time-consuming steps after the PCR amplification. The publication discusses how the PCR reaction products, the extended, double-stranded DNAs, are purified by gel electrophoresis and then re-ligated to generate the plasmid with cDNA insert.
Another method that employs PCR in generating site-specific mutants is discussed in Jones et al. (Methods: A Companion to Methods in Enzymology, vol. 2, no. 1, February, 1991, pp. 2-10; see also U.S. Pat. No. 5,286,632). This method also employs circular plasmids as a starting material. Two PCR primers are used to extend a DNA from a circular plasmid. The 5xe2x80x2 end of the primers used contains a region that is complementary to a region of two separate primers, which are used to extend a second DNA from a separate circular plasmid. These 5xe2x80x2 ends remain single-stranded to provide cohesive ends to the PCR products. The two extended PCR products can be combined in vitro or in vivo to form a circular plasmid. The mutations can be introduced by using nucleotide changes within the primer sequence or by incorporating a sequence from the complementary region of the primers into the final plasmid. Cloning methods have also been discussed that employ PCR for inserting target DNA into vectors (for example, U.S. Pat. No. 5,525,493, to Homes et al.). However, in the Homes et al. method, the vector sequences of the cloned DNA all derive from a single-stranded linear vector that is hybridized to the amplified target. Thus, the method of Homes et al. cannot produce a replication-competent vector directly from the DNA of a library since it requires additional vector sequences that are not amplified directly from the library.
In summary, current methods for isolating particular, desired nucleic acid molecules are restricted by time-consuming, material-intensive steps and/or by the limitations of physical separation techniques, and/or by the rarity of certain nucleic acid molecules in a library. Accordingly, a method that expedites the isolation of desired nucleic acids is highly desired in the art, particularly if the method simultaneously and conveniently purifies the desired nucleic acids.
The present invention provides a method and related kits for producing, purifying, isolating, or enriching desired nucleic acids from a library. The desired nucleic acids can be produced in a form that can be immediately replicated in a host cell. The invention is practiced in solution phase, thus eliminating the need for solid phase filter hybridizations, column hybridizations, or gel electrophoresis purification. In fact, no physical separation methods are required. Instead, the invention takes advantage of polymerization reactions, directed from appropriate primers, to enrich for desired nucleic acid sequences.
In a particular embodiment, the invention allows desired nucleic acid sequences to be enriched or isolated from a collection of nucleic acid sequences attached to replication-competent vector sequences, such as a library. For example, one embodiment employs two or more primer extension products, which are single-stranded replicas of regions of the insert/vector nucleic acid sequences, to make a complex that is capable of replication in host cells.
Specifically, in one aspect, the invention provides a method to enrich a desired, nucleic acid from a sample containing a mixture of nucleic acids. The method involves producing a vector having the desired nucleic acid insert from a sample comprising a plurality of different vectors. Initially, first and second extension products are generated from first and second primers, wherein the first and second primers anneal to complementary strands of the vector having the desired nucleic acid insert. One or both of the first or second primers contain a nucleic acid sequence found in the desired nucleic acid insert or its complement. The first and second extension products are capable of annealing to each other and together comprise a sequence for replication in a host and the desired nucleic acid insert. The step of generating first and second extension products may optionally be repeated. By combining and annealing the first and second extension products, a partially double-stranded, replication-competent vector is formed.
Typically, the mixture of nucleic acids is a nucleic acid library where certain nucleic acids, such as cDNA, have been inserted into at least one vector, preferably at least one linear vector. The vector may have substantially repeated sequences flanking the nucleic acid insert (especially where linear vectors are used). Many different types of vectors may be used, including plasmid, circular, linear, and bacteriophage lambda-derived vectors. One skilled in the art is familiar with numerous vectors that contain functional bacteriophage lambda sequences and thus are bacteriophage lambda-derived vectors. A particular advantage of the method is that the resulting replication-competent vectors can be directly transferred to and replicated by an appropriate host. More specifically, the method may initially involve denaturing the nucleic acid mixture so that primers can be annealed to particular sequences, such as the desired nucleic acid sequence or a sequence of the vector. The primers are extended to form extension products that can be annealed to generate a replication competent-vector comprising the nucleic acid insert. When there are two primers used, first and second extension products are generated. However, the method is not limited to the use of only two primers. The extension procedure may, optionally, be repeated until the desired nucleic acid sequence is sufficiently enriched with respect to other nucleic acid sequences in the mixture. In this way, generating the extension products alone can result in enrichment of the desired vector. In addition, one may optionally select against any nucleic acid sequences that have not been synthesized by an extension reaction. For example, by cleaving all library nucleic acid with an enzyme that will not cleave a newly synthesized extension product with incorporated modified nucleotides, only newly synthesized products will remain full-length. The procedures for using modified nucleotides in this way are described in copending U.S. application Ser. No. 08/442,993, filed Jan. 3, 1997, Ser. No. 08/779,355, filed Jan. 6, 1997, Ser. No. 08/592,938, filed Jan. 29, 1996, and Ser. No. 08/713,404, filed Sep. 13, 1996, specifically incorporated herein by reference. By cleaving all or substantially all but the extension products, the nucleic acids can be annealed to form a replication vector containing the desired nucleic acid. The other, remaining nucleic acids will not form replication competent-vectors. Thus, the enrichment of desired nucleic acids does not depend solely on the number of extension reactions performed.
In other embodiments, the extension products may be generated separately. The order in which the extension products are generated is not critical. Even when generated separately, the extension products may be made in the same reaction vessel.
The invention also includes embodiments where additional nucleic acid, generally having an additional vector sequence, are added to the annealing reaction of the nucleic acids of the first and second extension products. Together with the additional vector sequence, the first and second extension products comprise a sequence for replication in a host and comprise a desired nucleic acid insert. The first and second extension products are capable of annealing to the additional vector sequence, and the additional vector sequence is designed with that annealing in mind. After annealing, a partially double-stranded, replication competent-vector is formed.
The invention may also be incorporated into a kit, for example, a kit for performing one of the embodiments of the methods described. The kits of the invention may comprise specific primers and primer extension reaction or amplification reaction reagents.
Combining extension products, such as the first and second extension products, involves appropriate denaturing and annealing conditions and forms a partially double-stranded vector with the desired nucleic acid insert. This vector may be transformed directly into an appropriate host in order to purify or enrich for the desired nucleic acid.
The vector and primers used can be designed so that the host cell is capable of replicating the desired nucleic acid sequences and so that the transformed host cell containing the desired nucleic acid can be selected from other cells. Thus, the primers should be designed so that regions of the vector containing the replication and selectable marker elements of the vector, as well as the desired nucleic acid insert, are represented in the extension products. Many variations are possible. However, it is not crucial that any particular extension product contain any certain element or desired nucleic acid sequence. As long as the vector elements and the desired nucleic acid insert are represented in the combination of the first and second extension products, a replication competent-vector containing the desired nucleic acid insert will be generated. When the combination of the extension products results in a partially double-stranded vector, an optional repair step may be included to form a substantially double-stranded vector with desired nucleic acid.
Other aspects of the invention include PCR or other amplification reactions to extend the primers used. In a preferred embodiment, primer pairs may be used. For example, the primers can be selected and extended so that a first primer pair is capable of amplifying, in an amplification reaction such as PCR, an extension product comprising a region of the desired nucleic acid sequence (here, the insert) plus a portion of the vector sequence. A second primer pair is capable of amplifying, in an amplification reaction such as PCR, a second extension product, which may also comprise a region of the desired nucleotide sequence.
In this embodiment, the first extension product comprises a region of homology with the second extension product at its 3xe2x80x2 and 5xe2x80x2 termini. For example, the first and second extension products both comprise a sequence that is homologous to the sequence of the desired nucleic acid insert and a sequence homologous to a region of the vector. The homologous sequences in both of the extension products allows them to anneal to one another, thus forming a partially double-stranded vector. The primer pairs are selected so that the vector elements for replication in an appropriate host and a selectable marker are represented when the extension products are combined. Additional vector sequences may also be represented in the combination of first and second extension products. However, where the primer pairs anneal to the vector containing the desired nucleic acid insert is not crucial to the practice of the invention. Thus, numerous variations of primers or primer pairs can be made and used.
By combining the first and second extension products under appropriate denaturing and annealing conditions to permit regions of the first product to anneal to complementary regions of the second product, an annealed complex is formed comprising the desired nucleic acid sequence and vector replication and selection sequences. Optionally, one adds an enzyme having repair activity, such as a DNA polymerase, and nucleotide triphosphates under conditions that permit the enzyme to create double-stranded DNA. This repairs any single-stranded regions in the annealed complex. Preferably, this repair occurs under conditions where the enzyme having repair activity has little or substantially no strand displacement activity.