This invention is directed toward a method of identifying segments of nucleic acid characteristic of a particular genome in an organism by generating a set of discrete DNA amplification products characteristic of the genome. This set of discrete DNA products forms a fingerprint that can be used to identify the genome. The method can also be used to fingerprint a cell type based on differential gene expression in the cell.
For many purposes, it is important to be able to identify the genus, species or other taxonomic classification to which an organism belongs, or to be able to identify a tissue type, rapidly and accurately. Such taxonomic identification must be rapid for pathogenic organisms such as viruses, bacteria, protozoa, and multicellular parasites, and assists in diagnosis and treatment of human and animal disease, as well as studies in epidemiology and ecology. In particular, because of the rapid growth of bacteria and the necessity for immediate and accurate treatment of diseases caused by them, it is especially important to have a fast method of identification.
Traditionally, identification and classification of bacterial species has been performed by study of morphology, determination of nutritional requirements or fermentation patterns, determination of antibiotic resistance, comparison of isoenzyme patterns, or determination of sensitivity to bacteriophage strains. These methods are time-consuming, typically requiring at least 48 to 72 hours, often much more. Other more recent methods include the determination of RNA sequences (Woese, in xe2x80x9cEvolution in Procaryotesxe2x80x9d, Schleifer and Stackebrandt, Eds., Academic Press, London, 1986, the use of strain-specific fluorescent oligonucleotides (DeLong et al., Science 243, 1360-1363, 1989; Amann et al., J. Bact. 172, 762-770, 1990), and the polymerase chain reaction (PCR) technique (U.S. Pat. Nos. 4,683,195 and 4,683,202 to Mullis et al.; Mullis and Faloona, Methods Enzymol., 154:335-350, 1987).
DNA markers genetically linked to a selected trait has been commonly used for diagnostic procedures to identify an organism based on the genotype. The DNA markers commonly used are restriction fragment length polymorphisms (RFLPs). Polymorphisms useful in genetic mapping are those polymorphisms that segregate in populations. Traditionally, RFLPs have been detected by hybridization methodology (e.g. Southern blot), but such techniques are time-consuming and inefficient. Alternative methods include assays for polymorphisms using PCR.
The PCR method allows amplification of a selected region of DNA by providing two DNA primers, each of which is complementary to a portion of one strand within the selected region of DNA. These primers are used to hybridize to the separated strands within the region of DNA sought to be amplified, forming DNA molecules that are partially single-stranded and partially double-stranded. The double-stranded regions are then extended by the action of DNA polymerase, forming completely double-stranded molecules. These double-stranded molecules are then denatured and the denatured single strands are rehybridized to the primers. Repetition of this process through a number of cycles results in the generation of DNA strands that correspond in sequence to the region between the originally used primers. Specific PCR primer pairs can be used to identify genes characteristic of a particular species or even strain. PCR also obviates the need for cloning in order to compare the sequences of genes from related organisms, allowing the very rapid construction of phylogenies based on DNA sequence. For epidemiological purposes, specific primers to informative pathogenic features can be used in conjunction with PCR to identify pathogenic organisms.
Although PCR is a very powerful method for amplifying DNA, conventional PCR procedures require the use of at least two separate primers complementary to specific regions of the genome to be amplified. This requirement means that primers cannot be prepared unless the target DNA sequence information is available, and the primers must be xe2x80x9ccustom builtxe2x80x9d for each location within the genome of each species or strain whose DNA is to be amplified.
Although the newer methods have advantages over previous methods for genome identification, there is still a need for a rapid, simple method that can be applied to any species for which DNA can be prepared and that does not require reagents that are specific for each species or knowledge of the molecular biology, biochemistry, or DNA sequence of that species. It is also desirable that such a method be capable of identifying a species from a relatively small quantity of biological material. Additionally, it is highly desirable that such a method is also capable of generating polymorphisms useful in genetic mapping, especially of eukaryotes.
In addition to identification of related plant, animal and bacteria species, DNA segments or xe2x80x9cmarkersxe2x80x9d may be used to construct human genetic maps for genome analysis. Goals for the present human genome project include the production of a genetic map and an ordered array of clones along the genome. Using a genetic map, inherited phenotypes such as those that cause genetic diseases, can be localized on the map and ultimately cloned. The neurofibromatosis gene is a recent example of this strategy (Xu et al., Cell, 62:599-608, 1990). The genetic map is a useful framework upon which to assemble partially completed arrays of clones. In the short term, it is likely that arrays of human genomic clones such as cosmids or yeast artificial chromosomes (YACs, Burke et al., Science 236:806-812, 1987) will form disconnected contigs that can be oriented relative to each other with probes that are on the genetic map or the in situ map (Lichter et al., Science, 24:64-69, 1990), or both. The usefulness of the contig map will depend on its relation to interesting genes, the locations of which may only be known genetically. Similarly, the restriction maps of the human genome generated by pulsed field electrophoresis (PFE) of large DNA fragments, are unlikely to be completed without the aid of closely spaced markers to orient partially completed maps. Thus, a restriction map and an array of clones covering an entire mammalian genome, for example the mouse genome, is desirable.
Recently, RFLPs that have variable number tandem repeats (VNTRs) have become a method of choice for human mapping because such VNTRs tend to have multiple alleles and are genetically informative because polymorphisms are more likely to be segregating within a family. The production of fingerprints by Southern blotting with VNTRs (Jeffreys et al., Nature, 316:76-79, 1985) has proven useful in forensics. There are two classes of VNTRs; one having repeat units of 9 to 40 base pairs, and the other consisting of minisatellite DNA with repeats of two or three base pairs. The longer VNTRs have tended to be in the proterminal regions of autosomes. VNTR consensus sequences may be used to display a fingerprint. VNTR fingerprints have been used to assign polymorphisms in the mouse (Julier et al., Proc. Natl. Acad. Sci. USA, 87:4585-4589, 1990), but these polymorphisms must be cloned to be of use in application to restriction mapping or contig assembly. VNTR probes are useful in the mouse because a large number of crosses are likely to be informative at a particular position.
The mouse offers the opportunity to map in interspecific crosses which have a high level of polymorphism relative to most other inbred lines. A dense genetic map of DNA markers would facilitate cloning genes that have been mapped genetically in the mouse. Cloning such genes would be aided by the identification of very closely linked DNA polymorphisms. About 3000 mapped DNA polymorphisms are needed to provide a good probability of one polymorphism being within 500 kb of the gene. To place so many DNA markers on the map it is desirable to have a fast and cost-effective genetic mapping strategy.
Accordingly, the present invention, referred to herein as arbitrarily primed polymerase chain reaction or xe2x80x9cAP-PCRxe2x80x9d, provides a distinctive variation of the PCR technique by employing arbitrary primers. We have unexpectedly found that the use of a single primer used at low stringency hybridization conditions reproducibly generates specific discrete products that can be resolved into a manageable number of individual bands providing a species xe2x80x9cfingerprintxe2x80x9d. We have also found that the method can be extended to provide a fingerprint characteristic of a genotype at the DNA or RNA level.
The AP-PCR method is suitable for the rapid identification and classification of organisms, for the generation of polymorphisms suitable for genetic mapping of eukaryotes, for the identification of tissue and cell types, and for monitoring changes in the state of gene expression of a cell or tissue. Only a small sample of biological material is needed, and knowledge of the molecular biology, biochemistry, or the target DNA sequence to be identified is not required. In addition, reagents specific for a given species are not required.
In general, AP-PCR is a method for generating a set of discrete DNA products (xe2x80x9camplification productsxe2x80x9d) characteristic of a genome by priming target nucleic acid obtained from a genome or from a cellular RNA preparation with at least one single-stranded primer to form primed nucleic acid such that a substantial degree of mismatching, preferably internal mismatching, occurs between the primer and the target nucleic acid. The primed nucleic acid is then amplified by performing at least one cycle of polymerase chain reaction (PCR) amplification to generate DNA amplification products from mismatched primed sites in the genome. A second step of amplification by PCR is then performed using at least one more cycle, and preferably at least 10 cycles, of PCR amplification to generate a set of discrete DNA amplification products characteristic of the genome.
The single-stranded DNA primer is from about 10 to about 50 nucleotide bases in length, more preferably from about 17 to about 40 nucleotide bases in length. It can be of any sequence. The primer can have sequence redundancies reducing the occurrence of mismatches.
Among the possible primers, the following preferred primers can be used:
G-G-A-A-A-C-A-G-C-T-A-T-G-A-C-C-A-T-G-A (SEQ ID NO:2);
G-T-A-A-T-A-C-G-A-C-T-C-A-C-T-A-T-A-G (SEQ ID NO:3);
G-C-A-A-T-T-A-A-C-C-C-T-C-A-C-T-A-A-A-G (SEQ ID NO:4);
C-C-A-G-C-T-C-G-A-C-A-T-G-G-C-A-C-R-T-G-T-A-T-A-C-A-T-A-Y-G-T-A-A-C (SEQ ID NO:5);
G-G-G-G-A-C-T-A-G-T-A-A-A-A-C-G-A-C-G-G-C-C-A-G-T (SEQ ID NO:6);
G-A-G-A-G-G-A-G-A-A-G-G-A-G-A-G-A-G-A-A-R-R-R-R-R (SEQ ID NO:7);or
C-C-G-G-C-A-T-C-G-A-T-R-R-R-R-R-R-C-G-A-C-G-G-C-C-A-G (SEQ ID NO:8),
wherein R is either A or G, and wherein Y is either C or T.
Alternatively, a preferred arbitrarily chosen single-stranded primer can have a sequence of T-G-T-G-T-G at its 3xe2x80x2-terminus, or be about 20 bases in length with a sequence of A-C-G-C-G-C-A-C at its 3xe2x80x2-terminus.
The single-stranded primer can also be a mixture of at least two different or heterogeneous primer sequences. The different sequences can be of the same or different lengths.
In one embodiment of the method, the first cycle of amplification is performed under conditions in which each cycle of polymerase chain reaction amplification includes a step of incubation at a low stringency annealing temperature. The remaining cycles of polymerase chain reaction amplification, preferably at least 10 cycles, are performed under conditions in which each cycle of polymerase chain reaction amplification includes a step of incubation at a high stringency annealing temperature greater than the low stringency annealing temperature.
In an alternative embodiment, to produce a different pattern and raise the resolving power of the method, a second arbitrary primer is included in the same reaction so that amplification of the nucleic acid primed with each of the primers occurs simultaneously.
The annealing temperature in the first cycle is preferably from about 35xc2x0 C. to about 55xc2x0 C. The annealing temperature in the remaining cycles is about the melting temperature of the double-stranded DNA formed by annealing, about 35xc2x0 C. to 65xc2x0 C. for primers over 15 bases in length. Preferably this temperature is greater than about 55xc2x0 C., more preferably about 60xc2x0 C.
The genome to which the AP-PCR method is applied can be a viral genome; a bacterial genome, including Staphylococcus and Streptococcus; a plant genome, including rice, maize, or soybean, or an animal genome, including a human genome. It can also be a genome of a cultured cell line. The cultured cell line can be a chimeric cell line with at least one human chromosome in a non-human background.
The AP-PCR method can be used to identify an organism as a species of a genus of bacteria, for example, Staphylococcus, from a number of different species. Similarly, the method can be used to determine the strain to which an isolate of the genus Streptococcus belongs, by comparing the DNA amplification products produced by AP-PCR for the isolate to the patterns produced from known strains with the same primer.
The AP-PCR method can also be used to verify the assignment of a bacterial isolate to a species by comparing the AP-PCR fingerprint from the isolate with the AP-PCR fingerprints produced by known bacterial species with the same primer. For this application, the primer is chosen to maximize interspecific difference of the discrete DNA amplification products.
The target nucleic acid of the genome can be DNA, RNA or polynucleotide molecules. If the AP-PCR method is used to characterize RNA, the method also preferably includes the step of extending the primed RNA with an enzyme having reverse transcriptase activity to produce a hybrid DNA-RNA molecule, and priming the DNA of the hybrid with an arbitrary single-stranded primer. In this application, the enzyme with reverse transcriptase activity can be avian myeloblastosis virus reverse transcriptase or Moloney leukemia virus reverse transcriptase.
The discrete DNA amplification products produced by the AP-PCR method can be manipulated in a number of ways. For example, they can be separated in a medium capable of separating DNA fragments by size, such as a polyacrylamide or agarose gel, in order to produce a fingerprint of the amplification products as separated bands. Additionally, at least one separated band can be isolated from the fingerprint and reamplified by conventional PCR. The isolated separated band can also be cleaved with a restriction endonuclease. The reamplified fragments can then be isolated and cloned in a bacterial host. These methods are particularly useful in the detection and isolation of DNA sequences that represent polymorphisms differing from individual to individual of a species.
The ability of the AP-PCR method to generate polymorphisms makes it useful, as well, in the mapping and characterization of eukaryotic genomes, including plant genomes, animal genomes, and the human genome. These polymorphisms are particularly useful in the generation of linkage maps and can be correlated with RFLPs and other markers.
The AP-PCR method is suitable for thd identification of bacterial species and strains, including Staphylococcus and Streptococcus species, mammals and plants. The method of the present invention can identify species, cell types or tissues rapidly, using only a small amount of biological material, and does not require knowledge of the nucleotide sequence of other molecular biology of the nucleic acids of the organisms to be identified. The method can also be used to generate detectable polymorphisms for use in genetic mapping of animals and humans, and be used to detect differential gene expression within tissues.
The present invention provides a method with several advantages for identification of bacteria and other biological materials. The method is simple to perform and rapid; results can be obtained in as little as 36 hours when the template nucleic acids are isolated by boiling. Only small samples of material, e.g., nanogram amounts, are needed. The method yields information that allows the differentiation of even closely related species and can be extended to differentiate between subspecies, strains, or even tissues of the same species. The method requires no prior knowledge of any biochemical characteristics, including the nucleotide sequence of the target nucleic acids, of the organism to be identified. Hence, the primers are termed xe2x80x9carbitraryxe2x80x9d.
Initially, the method requires the use of no species-specific or sequence-specific reagents, because the primer used is completely arbitrarily chosen. Mismatching between the primer DNA and the target nucleic acids is characteristic of the method and is associated with the use of low stringency hybridization conditions during its initial amplification steps. It is advantageous to be able to initiate amplification in the presence of a substantial degree of mismatching because this widens the variety of primers able to initiate amplification on a particular genome.
Additionally, the method possesses the advantage of requiring only one primer sequence for amplification. This reduces the number of reagents required and alleviates the possibility of false results caused by primer artifacts resulting from the hybridization of two separate primers.
The AP-PCR method of the invention can be used to provide identification of other types of organisms, including viruses, fungi, mammals and plants. The method also provides an efficient way of identifying polymorphisms for use in genetic mapping, especially of eukaryotes, including animals, particularly mice and humans. This method has many applications in mammalian population genetics, pathology, epidemiology and forensics.
In addition to genus and species typing, the methods of the invention provide for the identification of tissue, as in tissue typing, and the identification of strain polymorphisms. For example, one could identify the site or tissue of origin of a metastatic tumor, or the stage of the tumor based on diagnostic differential gene expression. In addition, one can supplement histological identification by the ability to identify the tissue being evaluated using the tissue typing methods described herein.
Insofar as cells or tissues respond at the level of gene expression, one can use the present methods to detect changes in the cell or tissue using the present methods. For example, because particular genes respond to a particular agent or treatment, the method will indicate a response to the treatment at the level of differential gene expression. Thus, cells treated with a transforming agent, a growth factor, a cytokine, a mutagen, a viral pathogen and the like agent which alters the cell""s gene expression to produce a differential in the expressed RNA can be detected by the present invention.