Significance of cDNA library normalization
A typical somatic cell contains approximately 0.6 pg of mRNA. Thus, there are about 500,000 mRNA molecules per cell assuming that the average size of a mRNA is 2 kb (11.times.10.sup.-7 pg). These mRNAs occur in three frequency classes (reviewed by Davidson and Britten, 1979):
______________________________________ # mRNA # copies Total % mass species per species mRNAs ______________________________________ Superprevalent 10 (10-20) 10 5,000 50,000 Moderately 45 (40-45) 1,000 225 225,000 Prevalent Complex 45 (40-45) 15,000 15 225,000 ______________________________________
Accordingly, the rarest mRNA (1 copy per cell) will be present at a frequency of 1/500,000. Its representation in a cDNA library will depend on the number of independent recombinants. The probability that a given mRNA will be represented can be expressed by the equation P(x)=1-(1-f).sup.n, where f-frequency (1/500,000) and n-number of recombinant clones. Therefore, the probability that the most rare mRNA will not be represented in a cDNA library of 10.sup.7 recombinants is 2.times.10.sup.-9.
Although even the rarest mRNA will be represented in a library, its identification is very difficult (1/500,000). In a normalized cDNA library, however, the frequency of each clone is in the same narrow range and depends on the complexity of the library.
Assuming that there are 50,000 to 100,000 genes in the human genome (Bishop et al., 1974), an ideal normalized cDNA library from a great variety of tissues containing 1-2 kb cloned inserts of every single expressed human gene would have a complexity of 50,000 to 200,000 kb, and every clone would be represented at a frequency of 1/50,000 to 1/100,000, which would still be 5-10 times higher than the frequency of the most rare mRNA in a single somatic cell (1/500,000).
According to the considerations described above, the relative frequency of a member of each class of sequences (superprevalent, moderately prevalent and complex) in a representative cDNA library of a typical cell is I:II:III=1.7 and III=25. At Cot=250 (which is 10.times.the Cot.sub.1/2 of class III) of the leftover of each component, expressed as % of the initial amount, will be I=0.03%, II=0.6% and III=9%, while the relative average frequency of a member of each class will be 1:1:1, i.e., the library will be normalized.
Methods to normalize cDNA libraries
Thus far, two approaches have been proposed to obtain normalized cDNA libraries (Weissman, 1987). One approach is based on hybridization to genomic DNA. The frequency of each hybridized cDNA in the resulting normalized library would be proportional to that of each corresponding gene in the genomic DNA. The other is a kinetic approach. If cDNA reannealing follows second-order kinetics, rarer species anneal less rapidly and the remaining single-stranded fraction of cDNA becomes progressively more normalized during the course of the hybridization (Galau, et al., 1977). Specific loss of any species of cDNA, regardless of its abundance, does not occur at any Cot value.
Two groups have pursued independently the construction of normalized cDNA libraries based on the kinetic approach (Ko, 1990; Patanjali et al., 1991).
Ko (1990) reported the construction of a normalized mouse cDNA library by a complex scheme involving: a) ligation of cDNAs to a linker-primer adapter; b) three rounds of PCR amplification, denaturation-reassociation, and purification of single-stranded cDNAs by hydroxyapatite (HAP) column chromatography; and c) digestion of the end product using a site present in the linker-primer sequence and cloning (#' non-coding cDNA fragments only) into a plasmid vector.
Colony hybridization with eight probes of different abundances showed a reduction in abundance variation from at least 20,000 fold in the original library to 40-fold in the library constructed after three cycles of normalization.
In Ko's method, both coding and non-coding fragments are present during reassociation. However, after the final digestion and directional cloning steps only the 3' noncoding fragments remain in the normalized library. Ko's rationale for constructing a normalized library consisting exclusively of 3' non-coding sequences was the following. The 3' non-coding terminal exon of a mRNA is almost always unique to that transcript. Thus, during the reassociation step, each 3' non-coding sequence is expected to only reanneal to its very complementary strand. In contrast, coding exons may be conserved among members of a gene family, some of which might be less represented than others in a given tissue. Thus, during reassociation, the most frequent of such coding sequences might cross-hybridize to a related, but divergent, complementary strand from a less prevalent family member, which could result in the elimination of the rarer family member from the normalized library.
Patanjali et al. (1991) obtained a normalized library by a similar method which involved: a) cloning of short cDNAs produced by random priming into .lambda.gt10; b) PCR amplification of cloned DNAs; c) denaturation and reassociation to moderate Cot; d) separation of single-strands by HAP chromatography; e) PCR amplification of HAP-flow-through single-stranded cDNAs; and f) cloning into .lambda.gt10.
Patanjali's normalized library consisted of cDNA clones containing both coding and non-coding information. However, the cDNAs had to be relatively short and homogenous in length to assure equal efficiency of amplification during the polymerase chain reactions. The potential problem mentioned above of losing sequence representation of rare gene family members in the normalized library was not addressed in Patanjali's approach.