1. Field of the Invention
This invention is in the field of molecular and cellular biology. In general, the invention is related to a method for the identification and isolation of specific genetic sequences or genetic markers from the genomic DNA or cDNA of an organism. In particular, the invention is related to a method whereby a DNA fragment from a first sample of genomic DNA or cDNA, not found in a second sample of genomic DNA or cDNA, may be identified and isolated via a series of digestion, amplification, purification and sequencing steps. This invention has utility in the identification and isolation of genomic DNA or cDNA sequences that may serve as genetic markers for use in a variety of medical, forensic, industrial and plant breeding procedures.
2. Related Art
Genomic DNA
In examining the structure and physiology of an organism, tissue or cell, it is often desirable to determine its genetic content. The genetic framework (i.e., the genome) of an organism is encoded in the double-stranded sequence of nucleotide bases in the deoxyribonucleic acid (DNA) which is contained in the somatic and germ cells of the organism. The genetic content of a particular segment of DNA, or gene, is only manifested upon production of the protein which the gene ultimately encodes. In order to produce a protein, a complementary copy of one strand of the DNA double helix (the xe2x80x9csensexe2x80x9d strand) is produced by polymerase enzymes, resulting in a specific sequence of messenger ribonucleic acid (mRNA). This mRNA is then translated by the protein synthesis machinery of the cell, resulting in the production of the particular protein encoded by the gene. There are additional sequences in the genome that do not encode a protein (i.e., xe2x80x9cnoncodingxe2x80x9d regions) which may serve a structural, regulatory, or unknown function. Thus, the genome of an organism or cell is the complete collection of protein-encoding genes together with intervening noncoding DNA sequences. Importantly, each somatic cell of a multicellular organism contains the full complement of genomic DNA of the organism, except in cases of focal infections or cancers, where one or more xenogeneic DNA sequences may be inserted into the genomic DNA of specific cells and not into other, non-infected, cells in the organism. As noted below, however, the expression of the genes making up the genomic DNA may vary between individual cells.
cDNA and cDNA Libraries
Within a given cell tissue or organism, there exist myriad mRNA species, each encoding a separate and specific protein. This fact provides a powerful tool to investigators interested in studying genetic expression in a tissue or cellxe2x80x94mRNA molecules may be isolated and further manipulated by various molecular biological techniques, thereby allowing the elucidation of the full functional genetic content of a cell, tissue or organism.
One common approach to the study of gene expression is the production of complementary DNA (cDNA) clones. In this technique, the mRNA molecules from an organism are isolated from an extract of the cells or tissues of the organism. This isolation often employs solid chromatography matrices, such as cellulose or hydroxyapatite, to which oligomers of deoxythymidine (dT) have been complexed. Since the 3xe2x80x2 termini on all eukaryotic mRNA molecules contain a string of deoxyadenosine (dA) bases, and since dA binds to dT, the mRNA molecules can be rapidly purified from other molecules and substances in the tissue or cell extract. From these purified mRNA molecules, cDNA copies may be made using the enzyme reverse transcriptase, which results in the production of single-stranded cDNA molecules. The single-stranded cDNAs may then be converted into a complete double-stranded DNA copy of the original mRNA (and thus of the original double-stranded DNA sequence, encoding this mRNA, contained in the genome of the organism) by the action of a DNA polyinerase. The protein-specific double-stranded cDNAs can then be inserted into a plasmid, which is then introduced into a host bacterial cell. The bacterial cells are then grown in culture media, resulting in a population of bacterial cells containing (or in many cases, expressing) the gene of interest.
This entire process, from isolation of mRNA to insertion of the cDNA into a plasmid to growth of bacterial populations containing the isolated gene, is termed xe2x80x9ccDNA cloning.xe2x80x9d If cDNAs are prepared from a number of different mRNAs, the resulting set of cDNAs is called a xe2x80x9ccDNA library,xe2x80x9d representing the different functional (i.e., expressed) genes present in the source cell, tissue or organism. Genotypic analysis of these cDNA libraries can yield much information on the structure and function of the organisms from which they were derived.
DNA Fingerprinting
To determine the genotype of an organism, tissue or cell, a variety of molecular biological techniques are employed. These techniques allow researchers, clinicians, forensic scientists and others to probe for the presence of specific genes in the samples which are being studied. The results of such analyses may be useful to researchers in examining the phylogenetic relationship between two organisms, to clinicians in determining whether an individual is infected with a particular disease or is a carrier of a disease-related gene, and to forensic scientists in analyzing crime scene evidence such as blood or other tissues.
A technique often used in such genotypic analysis is known as DNA fingerprinting. This technique relies on the digestion of the DNA of an organism, tissue or cell with a restriction endonuclease enzyme which cleaves the DNA sample into fragments of discrete length. Due to the specificity with which different restriction endonucleases cleave their DNA substrates, a given set of enzymes will always produce the same results, in terms of fragment number and size (the term xe2x80x9csizexe2x80x9d as used herein is defined as the length and/or molecular weight of a given restriction fragment), from a given DNA sample. The restriction fragments may then be resolved by a variety of techniques such as size exclusion chromatography, gel electrophoresis, or attachment to a variety of solid matrices. Most commonly, gel electrophoresis is performed, and the restriction fragments are resolved into a series of bands on the gel via their differential mobilities within the gel (which is inversely related to fragment size). The pattern of these bands within the gel is specific for a given DNA sample, and is often referred to as the xe2x80x9cfingerprintxe2x80x9d of that sample.
When the DNA fingerprints of closely related organisms, tissues or even cells are compared, these fingerprints are often quite similar. However, subtle differences between the fingerprints may be observed. These differences, termed xe2x80x9cDNA polymorphisms,xe2x80x9d tend to increase in number (i.e., the fingerprints become more dissimilar) as DNA samples from more distantly related or unrelated organisms are compared. This technique of examining such Restriction Fragment Length Polymorphisms, or xe2x80x9cRFLPs,xe2x80x9d has been used for a number of years in genotypic analysis of eukaryotes such as plants (Tanksley, S. D. et al., Bio/Technology 7:257-264 (1989)) and animals, including humans (Botstein, D. et al., Am. J. Hum. Genet. 32:314-331 (1980)). In fact, RFLP analysis is being used in combination with other techniques in molecular biology to determine the complete structure (i.e., the xe2x80x9cmapxe2x80x9d) of the human genome (See, e.g., Donis-Keller, H. et al., Cell 51:319-337 (1987)). In this way, RFLP analysis can be used to determine the relationship, or lack thereof, between specific organisms, tissues or cells by a simple comparison of differences in their DNA fingerprints.
DNA Amplification
One early drawback to the use of RFLP analysis, however, was its requirement for larger amounts of DNA than are typically available in the samples to be analyzed. In addition, complex genomic samples are often difficult to analyze by RFLP, as a multitude of different DNA molecules are simultaneously fragmented and resolved. As a means of overcoming these difficulties, investigators have increasingly turned to methods that increase the copy number of, or xe2x80x9camplify,xe2x80x9d specific sequences of DNA in a sample.
A commonly used amplification technique is the Polymerase Chain Reaction (xe2x80x9cPCRxe2x80x9d) method invented by Mullis and colleagues (U.S. Pat. Nos. 4,683,195; 4,683,202; and 4,800,159). This method uses xe2x80x9cprimerxe2x80x9d sequences which are complementary to opposing regions on the DNA sequence to be amplified. These primers are added to the DNA target sample, along with a molar excess of nucleotide bases and a DNA polymerase (e.g., Taq polymerase), and the primers bind to their target via base-specific binding interactions (.ie., adenine binds to thynine, cytosine to guanine). By repeatedly passing the reaction mixture through cycles of increasing and decreasing temperatures (to allow dissociation of the two DNA strands on the target sequence, synthesis of complementary copies of each strand by the polymerase, and re-annealing of the new complementary strands), the copy number of a particular sequence of DNA may be rapidly increased.
Other techniques for amplification of target nucleic acid sequences have also been developed. For example, Walker et al. (U.S. Pat. No. 5,455,166; EP 0 684 315) described a method called Strand Displacement Amplification (SDA), which differs from PCR in that it operates at a single temperature and uses a polymerase/endonuclease combination of enzymes to generate single-stranded fragments of the target DNA sequence, which then serve as templates for the production of complementary DNA (cDNA) strands. An alternative amplification procedure, termed Nucleic Acid Sequence-Based Amplification (NASBA) was disclosed by Davey et al. (U.S. Pat. No. 5,409,818; EP 0 329 822). Similar to SDA, NASBA employs an isothermal reaction, but is based on the use of RNA primers for amplification rather than DNA primers as in PCR or SDA.
PCR-based DNA Fingerprinting
Despite the availability of a variety of amplification techniques, most DNA fingerprinting methods rely on PCR for amplification, taking advantage of the well-characterized protocols and automation available for this technique. Examples of these PCR-based fingerprinting techniques include Random Amplified Polymorphic DNA (RAPD) analysis (Williams, J. G. K. et al., Nucl. Acids Res. 18(22):6531-6535 (1990)), Arbitrarily Primed PCR (AP-PCR; Welsh, J., and McClelland, M., Nucl. Acids Res. 18(24):7213-7218 (1990)), DNA Amplification Fingerprinting (DAF; Caetano-Anolles et al., Bio/Technology 9:553-557 (1991)), and microsatellite PCR or Directed Amplification of Minisatellite-region DNA (DAMD; Heath, D. D. et al., Nucl. Acids Res. 21(24):5782-5785 (1993)). All of these methods are based on the amplification of random DNA fragments by PCR, using arbitrarily chosen primers. The utility of these techniques is limited, however, by their extreme sensitivity to the quality of the target DNA, which may be poor in some genomic or cDNA library samples. Use of poor-quality (e.g., fragmented, degraded or otherwise non-intact) DNA in these techniques can lead, for example, to spurious results due to incomplete amplification of desired target DNA sequences.
More recently, a technique named Amplification Fragment Length Polymorphism (AFLP) analysis was developed by Vos and colleagues (EP 0 534 858; Vos, P. et al., Nucl. Acids Res. 23(21):4407-4414 (1995)). This technique, which is also PCR-based, uses specific combinations of restriction endonucleases and adapters of discrete sequences, as well as primers that contain the common sequences of the adapters. In this way, a sequence or fragment of DNA in a complex sample may be specifically amplified and used for further analysis. The value of AFLP in genomic analyses of certain plant and bacterial strains has been demonstrated (Lin, J.-J., and Kuo, J., Focus 17(2):66-70 (1995); Lin, J.-J., et al, Plant Molec. Biol. Rep. 14(2):156-169 (1996)), while others have used AFLP for HLA-DR genotyping in humans (Yunis, I. et al., Tissue Antigens 38:78-88 (1991)).
Identification of Tissue-Specific cDNAs and Genomic Genetic Markers
Despite the success of genetic mapping using the foregoing techniques, however, these methods are limited in their abilities to identify source-specific DNA sequences. This limitation is particularly true for those sequences derived from genomic DNA samples from different cells, tissues or organisms, and for those derived from tissue cDNA libraries which comprise only those DNA molecules that are actively expressed (i.e., used to make proteins) in the particular tissue and which are thus a subset of genomic DNA. For cDNA libraries, however, methods have been developed that overcome these limitations to some extent.
One such method, termed differential hybridization, relies on the knowledge that specific genes are expressed differentially in certain cells or tissues as opposed to other cells or tissues. To identify these cell- or tissue-specific genes, one can simply prepare cDNAs from two different cell or tissue types and separately hybridize the cDNA samples to oligonucleotide probes prepared from each of the samples. The resultant hybridization patterns can then be compared, and any differences observed indicate the cell- or tissue-specific expression of one or more genes (and thus the presence, in a cDNA library prepared from that cell or tissue, of a specific cDNA). This technique was used to identify growth factor-regulated genes that are specifically expressed in cells stimulated to grow by treatment with serum but that are not expressed in quiescent cells (Lau, H. F., and Nathans, D., EMBO J 4:3145-3151 (1985)).
A second, somewhat more sensitive, technique for identifying tissue-specific DNAs is the use of subtractive libraries (See Hedrick, S. M. et al., Nature 308:149-153 (1984); Lin, J.-J., et al., FOCUS 14(3):98-101 (1993)). In this method, cDNAs prepared from the one tissue or cell type are mixed with the mRNAs from another, closely related, tissue or cell type. The cDNAs that are expressed in both cells or tissues then form DNA-RNA hybridization complexes, since they are complementary to each other, while the cDNAs expressed selectively in one cell/tissue but not the other will not form such a complex. The DNA-RNA complexes, representing cDNAs that are not tissue-specific, can then be removed from the mixtures (i.e., xe2x80x9csubtractedxe2x80x9d) by passing the mixture through a poly-dT or hydroxyapatite column, to which the unhybridized cDNAs will not bind. This procedure thus results in a purified sample that is enriched in tissue- or cell-specific cDNAs.
Amplification-Based Cloning
While differential hybridization and the use of subtractive libraries may be suitable for the identification of DNA sequences that are expressed at relatively high levels in the source cells or tissues, they are not particularly useful when the starting samples contain only low levels of genomic DNA (or mRNA used to make cDNAs). This problem is particularly important when the tissue or cell samples are themselves present in low quantities (as in many medical or forensic applications), or when the specific DNA sequence is expressed at low levels in the cell/tissue samples.
PCR-based cloning of tissue-specific cDNAs has been used in the attempt to overcome the lack of sensitivity of earlier approaches (see, e.g., Lee, C. C., et al., Science 239:1288-1291 (1988)). However, this approach still suffers from the major shortcoming of PCR itselfxe2x80x94the requirement for prior knowledge of the nucleotide sequence of the DNA to be amplified, to allow construction of complementary PCR primers. Without knowing the nucleotide sequence of the target DNA, PCR cannot be performed in order to amplify this sequence in the sample. Since the target sequences are not known in many medical or forensic samples, PCR-based cloning is not useful for the identification or isolation of tissue-specific cDNAs from these samples. For the same reasons, these techniques are not suitable for the identification of previously unknown or uncharacterized genes from cDNA libraries or genomic samples. Furthermore, as noted above, the complexity of genomic DNA limits the utility of these techniques in the identification and isolation of genetic markers from the genome of a cell or organism.
Thus, there remains an unmet need for a rapid, reproducible and reliable technique for identifying fragments of DNA, or genes, that are unique to the genomes of specific organisms, tissues or cells, or that are unique to cDNA libraries prepared from these specific sources, without prior knowledge of the nucleotide sequence of the unique DNA fragments. Particularly desirable are methods that would rapidly identify, and allow the isolation of, specific DNA sequences found in one source cDNA library or genome but not in another library or genome. Such a technique would find utility in a variety of applications, particularly in clinical, forensic and plant breeding applications.
The present invention is directed to AFLP-based methods that address these unmet needs. In particular, the invention relates to such methods that allow the identification and isolation of tissue-specific cDNAs from cDNA libraries, or the identification and isolation of specific genetic markers from samples of genomic DNA.
In one embodiment, the invention is directed to a method for identifying a cDNA fragment from a first cDNA library which is not present in a second cDNA library, comprising the steps of (a) digesting a first and second cDNA library with at least one restriction enzyme to give a collection of restriction fragments, and (b) identifying one or more unique fragments from the first cDNA library by comparing the fragments from the first cDNA library to fragments from the second cDNA library.
In another embodiment, the invention is directed to a method for identifying a genetic marker, comprising a DNA fragment from a first sample of genomic DNA, which is not present in a second sample of genomic DNA. This method comprises the steps of (a) digesting the first and second samples of genomic DNA with at least one restriction enzyme to give a collection of restriction fragments, and (b) identifying one or more unique DNA fragments in the first or second samples of genomic DNA by comparing the fragments obtained from one sample of genomic DNA to those obtained from the other sample.
According to the invention, the identifying step in the above methods is preferably accomplished by separating the restriction fragments according to size, which is as used herein is defined as the length and/or molecular weight of the restriction fragments. This aspect of the invention may further comprise sequencing the unique cDNA or genomic DNA fragments, and may entail amplification of the restriction fragments prior to the identifying step (b). In another aspect of the invention, the restriction fragments are detectably labeled. The present invention also encompasses the above method which further comprises the steps of (c) isolating at least one unique fragment, and (d) inserting the fragment into a vector, which may be an expression vector, for use in transfecting or transforming a prokaryotic or eukaryotic host cell; the fragment may be amplified prior to insertion into the vector. In another aspect of this embodiment, the unique fragment may be sequenced according to routine nucleotide sequencing methods.
In another embodiment, the present invention provides a method for isolating a cDNA from a first cDNA library, comprising the steps of (a) mixing one or more of the unique fragments identified as summarized above, or one or more oligonucleotide probes which are complementary to the fragments, with a first cDNA library under conditions stringent for hybridization of the unique fragments or oligonucleotide probes to the first CDNA library; and (b) isolating a cDNA which is complementary to the unique fragments or to the oligonucleotide probes. Analogously, the invention also provides a method for isolating a genetic marker, comprising a DNA fragment, from a sample of genomic DNA. This method comprises the steps of (a) mixing one or more of the unique fragments identified as summarized above, or one or more oligonucleotide probes which are complementary to the fragments, with a sample of DNA under conditions stringent for hybridization of the unique fragments or oligonucleotide probes to the sample of DNA; and (b) isolating a DNA fragment which is complementary to the unique fragments or to the oligonucleotide probes.
According to the present invention, the isolation steps (b) of the above-described methods may be accomplished by gel electrophoresis, density gradient centrifugation, sizing chromatography, affinity chromatography, immunoadsorption, or immunoaffinity chromatography. In this embodiment, the isolated CDNA or DNA fragments may also be sequenced, amplified, or inserted into a vector (which may be an expression vector). DNA fragments isolated by this embodiment of the present invention will be useful in, for example, the preparation of DNA or RNA probes, and to aid in a variety of medical, forensic, industrial and plant breeding applications.
The invention also encompasses the methods described above, wherein the amplification of the unique cDNA or genomic DNA fragments is accomplished by a method comprising the steps of (a) ligating one or more adapter oligonucleotides to a unique cDNA fragment or genomic DNA fragment to form a DNA-adapter complex; (b) hybridizing the DNA-adapter complex, under stringent conditions, with one or more oligonucleotide primers which are complementary to the adapter portion of the DNA-adapter complex to form a hybridization complex; and (c) ampliiying the DNA-adapter complex. In this aspect of the invention, the adapter oligonucleotide may contain one or more restriction sites which may be used to insert the DNA-adapter complex into a vector.
According to the present invention, the first and second cDNA libraries or samples of genomic DNA used in the above-described methods may be derived from an individual cell (which may be prokaryotic or eukaryotic), a tissue (which may be a plant or an animal tissue, most preferably a human tissue including a human embryonic or fetal tissue), an organ, or a whole organism. The genetic marker identified according to this embodiment of the invention may be a cancer marker, an infectious disease marker, a genetic disease marker, a marker of embryonic development, a tissue-specific marker or an enzyme marker. In one such aspect of the invention, one cDNA library or sample of genomic DNA may be derived from an animal suffering from an infectious disease (e.g., a disease of bacterial, fungal, viral or parasitic origin) and the other cDNA library or sample of genomic DNA may be from an animal not suffering from an infectious disease. In another aspect, one cDNA library or sample of genomic DNA may be derived from an animal suffering from cancer and the other may be derived from an animal not suffering from cancer. In another aspect, one cDNA library or sample of genomic DNA may be obtained from a cancerous animal tissue and the other from a noncancerous animal tissue, which tissues may both be obtained from the same animal. In another aspect, one cDNA library or sample of genomic DNA may be from an animal suffering from a genetic disease and the other may be from an animal not suffering from a genetic disease. In another aspect, one cDNA library or sample of genomic DNA may be derived from a pathogenic microorganism and the other from a non-pathogenic organism. In another aspect, one cDNA library or sample of genomic DNA may be derived from an organism expressing an enzyme, and the other sample may be derived from an organism not expressing an enzyme. In another aspect, one cDNA library or sample of genomic DNA may be derived from an organism expressing an industrially useful protein, and the second may be derived from an organism not expressing an industrially useful protein. In another aspect, one cDNA library or sample of genomic DNA may be derived from a diseased plant and the other sample may be derived from a non-diseased plant. In another aspect, one cDNA library or sample of genomic DNA may be from a plant resistant to an environmental stress, which may be drought, excess temperature, diminished temperature, chemical toxicity by herbicides, pollution, excess light or diminished light, and the other sample may be from a plant not resistant to an environmental stress.
In another embodiment, the present invention provides a method of determining the relationship between a first individual and a second individual comprising the steps of (a) digesting a cDNA library or a sample of genomic DNA obtained from the first and second individuals with at least one restriction enzyme to give a collection of restriction fragments; (b) separating the restriction fragments from the first and second individuals according to size; and (c) determining the similarities and dissimilarities of the sizes or concentrations of the restriction fragments separated in step (b). In a preferred aspect of this embodiment, this comparison is accomplished by computer analysis.
Other preferred embodiments of the present invention will be apparent to one of ordinary skill in light of the following drawings and description of the invention, and of the claims.