The present invention is in the fields of molecular biology and genetics. The invention relates generally to methods for producing normalized nucleic acid libraries, such that the variation in the abundance of the individual nucleic acid molecules in the library is substantially reduced (e.g., to no greater than about two orders of magnitude). The invention also relates to normalized libraries produced by these methods, to nucleic acid molecules isolated from these libraries, to genetic constructs (e.g., vectors) comprising these nucleic acid molecules, and to host cells comprising such normalized libraries.
The elucidation of the mechanisms that dictate the normal functioning of living cells requires a detailed understanding of the information encoded in all of the genes (also referred to here synonymously as the genome). To map and sequence the genes contained in the genomes of different organisms, messenger RNA (mRNA) sequences, which are representative of the genes of the genome, are typically used to evaluate the genetic make up of the particular cell or organism of interest. However, the mRNAs (estimated to number 100,000 in human) are produced at different levels within different cell types at different points in development (e.g., there are less than one copy per cell of some mRNAs and there are millions of copies per cell of others). These mRNAs, their developmental and cell-type specific regulated expression, and their translation into protein is what produces the unique character of a particular cell type. For example, adult muscle cells produce high levels of myoglobin mRNA whereas mature red blood cells contain high levels of hemoglobin. In the fetus, hemoglobin is produced by the liver; however, following birth, the type of hemoglobin produced and the tissue source both change, due to changes in gene expression.
An understanding of the molecular details of normal functioning of cells is essential in order to understand and treat inherited diseases where the regulation and expression of one or more genes may have changed. Integral to this goal is the production of libraries of cloned nucleic acids from which all or substantially all of the members of the libraries can be isolated with approximately equal probability.
A normalized library with a lower range of its members relative concentrations, for example as low as about 2-4 fold, would have the advantage of making essentially all of the mRNAs available for isolation and subsequent analysis. This type of library would further the understanding of the normal function of individual genes and the genome in general. However, none of the methods reported heretofore have resulted in the production of normalized nucleic acid libraries where essentially all of the nucleic acid molecules or genes expressed in a particular cell or tissue type are represented and can be isolated with high probability. Although some investigators have attempted to normalize (i.e., reduce the variation in the relative abundance of the components of the population of nucleic acid molecules), none have been successful at bringing the relative abundance of the total population to within a range of two orders of magnitude (Bonaldo, M., Lennon, G., Soares, M. B., Genome Res. 6:791-866 (1996); Ko, M. S. H., Nucl. Acids Res. 18:5705-5711 (1990); Pantanjali, S. R., et al., Proc. Natl. Acad. Sci. USA 88:1943-1947 (1991); Soares, M. B., Proc. Natl. Acad. Sci. USA 91:9228-9232 (1994)). The resulting xe2x80x9cnormalizedxe2x80x9d libraries have failed to provide the quantity of novel information needed to understand the expression of most genes. Thus, there exists a current need for methods of producing normalized nucleic acid libraries, and for normalized nucleic acid libraries produced by such methods.
The present invention meets this need by providing methods for producing normalized nucleic acid libraries (i.e., libraries of cloned nucleic acid molecules from which each member nucleic acid molecule can be isolated with approximately equivalent probability). In particular, the invention relates to methods for normalization of a nucleic acid library, which may be a single-stranded or double-stranded cDNA library, comprising:
(a) synthesizing one or more nucleic acid molecules complementary to all or a portion of the nucleic acid molecules of the library, wherein the synthesized nucleic acid molecules comprise at least one hapten, thereby producing haptenylated nucleic acid molecules (which may be RNA molecules or DNA molecules);
(b) incubating a nucleic acid library to be normalized with the haptenylated nucleic acid molecules (e.g. also referred to as driver) under conditions favoring the hybridization of the more highly abundant molecules of the library with the haptenylated nucleic acid molecules; and
(c) removing the hybridized molecules, thereby producing a normalized library.
In a preferred aspect of the invention, the relative concentration of all members of the normalized library are within one to two orders of magnitude. In another preferred aspect, the invention allows removal or elimination of contaminating nucleic acid molecule from the normalized library. Such contamination may include vectors within the library which do not contain inserts (e.g. background). In this manner, all or a substantial portion of the normalized library will comprise vectors containing inserted nucleic acid molecules of the library.
The invention also relates to such methods wherein the conditions favoring hybridization of the more highly abundant molecules of the library with the haptenylated molecules are selected from the group consisting of. (a) a COT equal to or greater than 25; (b) a COT equal to or greater than 50; (c) a COT equal to or greater than 100; (d) a COT equal to or greater than 1,000; (e) a COT equal to or greater than 2,000; (f) a COT equal to or greater than 5,000; (g) a COT from about 10 to 10,000; (h) a COT from about 25 to 10,000; (i) a COT from about 50 to 10,000; (j) a COT from about 1,000 to 10,000; (k) a COT from about 5,000 to 10,000; (l) a COT from about 500 to 5,000; (m) a COT from about 100 to 1000; and (n) a COT of less than 10,000.
In a preferred aspect of the invention, a population of mRNA is incubated under conditions sufficient to produce a population of cDNA molecules complementary to all or a portion of said mRNA molecules. Preferable, such a population of cDNA molecules (e.g. single stranded cDNA) is produced by mixing the population of mRNA molecules (template molecules) with one or more polypeptides having reverse transcriptase activity and incubating said mixture under conditions sufficient to produce a population of single stranded cDNA molecules complementary to all or a portion of said mRNA molecules. The single stranded cDNA molecules may then be used as template molecules to make double stranded cDNA molecules by incubating the mixture under appropriate conditions in the presence of one or more DNA polymerases. The resulting population of double-stranded or single-stranded cDNA libraries may be normalized in accordance with the invention. Preferably, such cDNA libraries are inserted into one or more vectors prior to normalization. Alternatively, the cDNA libraries may be normalized prior to insertion within one or more vectors, and after normalization may be cloned into one or more vectors.
In a particularly preferred aspect of the invention, the library to be normalized is contained in (inserted in) one or more vectors, which may be a plasmid, a cosmid, a phagemid and the like. Such vectors preferably comprise one or more promoters which allow the synthesis of at least one RNA molecule from all or a portion of the nucleic acid molecules (preferably cDNA molecules) inserted in the vector. Thus, by use of the promoters, haptenylated RNA molecules complementary to all or a portion of the nucleic acid molecules of the library may be made and used to normalize the library in accordance with the invention. Such synthesized RNA molecules (which have been haptenylated) will be complementary to all or a portion of the vector inserts of the library. More highly abundant molecules in the library may then be preferentially removed by hybridizing the haptenylated RNA molecules to the library, thereby producing the normalized library of the invention. Without being limited, the synthesized RNA molecules are thought to be representative of the library; that is, more highly abundant species in the library result in more highly abundant haptenylated RNA using the above method. The relative abundance of the molecules within the library, and therefore, within the haptenylated RNA determines the rate of removal of particular species of the library; if a particular species abundance is high, such highly abundant species will be removed more readily while low abundant species will be removed less readily from the population. Normalization by this process thus allows one to substantially equalize the level of each species within the library.
In another preferred aspect of the invention, the library to be normalized need not be inserted in one or more vectors prior to normalization. In such aspect of the invention, the nucleic acid molecules of the library may be used to synthesize haptenylated nucleic acid molecules using well known techniques. For example, haptenylated nucleic acid molecules may be synthesized in the presence of one or more DNA polymerases, one or more appropriate primers or probes and one or more nucleotides (the nucleotides and/or primers or probes may be haptenylated). In this manner, haptenylated DNA molecules will be produced and may be used to normalized the library in accordance with the invention. Alternatively, one or more promoters may be added to (or ligated to) the library molecules, thereby allowing synthesis of haptenylated RNA molecules for use to normalize the library in accordance with the invention. For example, adapters containing one or more promoters are added to (ligated to) one or more ends of double stranded library molecules (e.g. cDNA library prepared from a population of mRNA molecules). Such promoters may then be used to prepare haptenylated RNA molecules complementary to all or a portion of the nucleic acid molecules of the library. In accordance with the invention, the library may then be normalized and, if desired, inserted into one or more vectors.
While haptenylated RNA is preferably used to normalize libraries, other haptenylated nucleic acid molecules may be used in accordance with the invention. For example, haptenylated DNA may be synthesized from the library and used in accordance with the invention.
Haptens suitable for use in the methods of the invention include, but are not limited to, avidin, streptavidin, protein A, protein G, a cell-surface Fc receptor, an antibody-specific antigen, an enzyme-specific substrate, polymyxin B, endotoxin-neutralizing protein (ENP), Fe+++, a transferrin receptor, an insulin receptor, a cytokine receptor, CD4, spectrin, fodrin, ICAM-1, ICAM-2, C3bi, fibrinogen, Factor X, ankyrin, an integrin, vitronectin, fibronectin, collagen, laminin, glycophorin, Mac-1, LFA-1, xcex2-actin, gp120, a cytokine, insulin, ferrotransferrin, apotransferrin, lipopolysaccharide, an enzyme, an antibody, biotin and combinations thereof. A particularly preferred hapten is biotin.
In accordance with the invention, hybridized molecules produced by the above-described methods may be isolated, for example by extraction or by hapten-ligand interactions. Preferably, extraction methods (e.g. using organic solvents) are used. Isolation by hapten-ligand interactions may be accomplished by incubation of the haptenylated molecules with a solid support comprising at least one ligand that binds the hapten. Preferred ligands for use in such isolation methods correspond to the particular hapten used, and include, but are not limited to, biotin, an antibody, an enzyme, lipopolysaccharide, apotransferrin, ferrotransferrin, insulin, a cytokine, gp120, xcex2-actin, LFA- 1, Mac-1, glycophorin, laminin, collagen, fibronectin, vitronectin, an integrin, ankyrin, C3bi, fibrinogen, Factor X, ICAM-1, ICAM-2, spectrin, fodrin, CD4, a cytokine receptor, an insulin receptor, a transferrin receptor, Fe+++, polymyxin B, endotoxin-neutralizing protein (ENP), an enzyme-specific substrate, protein A, protein G, a cell-surface Fc receptor, an antibody-specific antigen, avidin, streptavidin or combinations thereof. The solid support used in these isolation methods may be nitrocellulose, diazocellulose, glass, polystyrene, polyvinylchloride, polypropylene, polyethylene, dextran, Sepharose, agar, starch, nylon, a latex bead, a magnetic bead, a paramagnetic bead, a superparamagnetic bead or a microtitre plate. Preferred solid supports are magnetic beads, paramagnetic beads and superparamagnetic beads, and particularly preferred are such beads comprising one or more streptavidin or avidin molecules.
In another aspect of the invention, normalized libraries are subjected to further isolation or selection steps which allow removal of unwanted contamination or background. Such contamination or background may include undesirable nucleic acids. For example, when a library to be normalized is constructed in one or more vectors, a low percentage of vector (without insert) may be present in the library. Upon normalization, such low abundance molecules (e.g. vector background) may become a more significant constituent as a result of the normalization process. That is, the relative level of such low abundance background may be increased as part of the normalization process.
Removal of such contaminating nucleic acids may be accomplished by incubating a normalized library with one or more haptenylated probes which are specific for the nucleic acid molecules of the library (e.g. target specific probes). In principal, removal of contaminating sequences can be accomplished by selecting those nucleic acids having the sequence of interest or by eliminating those molecules that do not contain sequences of interest. In accordance with the invention, removal of contaminating nucleic acid molecules may be performed on any normalized library (whether or not the library is constructed in a vector). Thus, the probes will be designed such that they will not recognize or hybridize to contaminating nucleic acids (as in the preferred embodiment using the oligodA-NotI 3xe2x80x2 biotin probe). Upon hybridization of the haptenylated probe with nucleic acid molecules of the library, the haptenylated probes will bind to and select desired sequences within the normalized library and leave behind contaminating nucleic acid molecules, resulting in a selected normalized library. The selected normalized library may then be isolated. In a preferred aspect, such isolated selected normalized libraries are single-stranded, and may be made double stranded following selection by incubating the single-stranded library under conditions sufficient to render the nucleic acid molecules double-stranded. The double stranded molecules may then be transformed into one or more host cells. Alternatively, the normalized library may be made double stranded using the haptenylated probe or primer (preferably target specific) and then selected by extraction or ligand-hapten interactions. Such selected double stranded molecules may then be transformed into one or more host cells.
In another aspect of the invention, contaminating nucleic acids may be reduced or eliminated by incubating the normalized library in the presence of one or more primers specific for library sequences (specific for insert-containing clones, e.g. oligodA-NotI). This aspect of the invention may comprise incubating the single stranded normalized library with one or more nucleotides (preferably nucleotides which confer nuclease resistance to the synthesized nucleic acid molecules), and one or more polypeptides having polymerase activity, under conditions sufficient to render the nucleic acid molecules double-stranded. The resulting double stranded molecules may then be transformed into one or more host cells. Alternatively, resulting double stranded molecules containing nucleotides which confer nuclease resistance may be digested with such a nuclease and transformed into one or more host cells.
In yet another aspect, the elimination or removal of contaminating nucleic acid may be accomplished prior to normalization of the library, thereby resulting in selected normalized library of the invention. In such a method, the library to be normalized may be subjected to any of the methods described herein to remove unwanted nucleic acid molecules and then the library may then be normalized by the process of the invention to provide for the selected normalized libraries of the invention.
In accordance with the invention, double stranded nucleic acid molecules are preferably made single stranded before hybridization. Thus, the methods of the invention may further comprise treating the above-described double-stranded nucleic acid molecules of the library under conditions sufficient to render the nucleic acid molecules single-stranded. Such conditions may comprise degradation of one strand of the double-stranded nucleic acid molecules (preferably using gene II protein and Exonuclease III), or denaturing the double-stranded nucleic acid molecules using heat, alkali and the like.
The invention also relates to normalized nucleic acid libraries, selected normalized nucleic acid libraries and transformed host cells produced by the above-described methods.
Other preferred embodiments of the present invention will be apparent to one of ordinary skill in light of the following drawings and description of the invention, and of the claims.