Nucleic acid polymorphism provides a means to identify species, serotypes, strains, varieties, breeds or individuals based on differences in their genetic make up. Nucleic acid polymorphism can be caused by nucleotide substitution, insertion or deletion. The ability to determine genetic polymorphism has widespread application in areas such genome mapping, genetic linkage studies, medical diagnosis, epidemiological studies, forensics and agriculture. Several methods have been developed to compare homogenous segments of DNA to determine if polymorphism exists. All of these methods, however, suffer from several shortcomings which limit their practical application.
The most direct method for detection of polymorphism is by direct comparison of sequences of different genomes. Although the development of automated sequencing methods has greatly lessened the time and effort required to sequence DNA, the size of the genome for even simple prokaryotic organisms is so large that direct sequence is impractical for routine detection of polymorphisms.
One method for determination of genetic polymorphism in genomic DNA is by pulsed field gel electrophoresis. Pulse field gel electrophoresis or PFGE is used to detect differences in large fragments of genomic DNA. In PFGE, pulsed, alternating, orthogonal electric fields are applied to the gel. Large molecules of DNA migrating through the gel become trapped in their reptation tubes each time the direction of the electric field is altered. The progress for the DNA through the gel is then halted until the DNA has had sufficient time to reorient itself along the new axis of the electric field. The degree of resolution of PFGE depends on several factors including the uniformity of the electric fields, the absolute lengths of the electric pulses, the ratio of the pulse lengths used to generate the electric fields, the angles of the two electric fields to the gel, and the relative strengths of the electric fields. Several variations of PFGE have been developed to increase the resolution, however, the technique remains useful only for the detection of relatively large differences, generally 5-10 kb, between DNA molecules
Another method for the detection of genetic differences is restriction fragment length polymorphism or RFLP (Botstein et al. Am. J. Hum. Genet. 342:314, 1980). RFLP is based on the specificity with which restriction enzymes cleave DNA. A restriction enzyme will cleave DNA only when a particular sequence is present. Any alteration in the sequence will result in the enzyme being unable to cleave the DNA at that site, which in turn results in a change in the size of the fragment produced. Changes elsewhere in the sequence may introduce new recognition sites within the sequence, again altering the size distribution of the restriction fragments. Changes in the DNA sequence caused by insertions, deletions and inversions will also result in changes in the size of the fragments produced and so can also be detected by RFLP. Essentially any restriction enzyme can be used in RFLP and combinations of enzymes can also be used.
Differences in the size of restriction enzyme fragments are typically determined by Southern blots (Southern, J. Mol. Biol. 98:103, 1975). After digestion by the restriction enzyme of choice, the resulting fragments are size separated by gel electrophoresis, transferred to a membrane, and hybridized with labeled probe corresponding to a particular area of the genome. RFLP has proved to be a powerful tool in genetic analysis, but often requires a great deal of optimization in order to obtain low background to signal ratios to allow detection of polymorphic markers. Also the method is limited to sequences that can be detected by the DNA probes used. In addition, point mutations can usually be detected only if they occur in a restriction site.
Another method for determining genetic polymorphism uses primers of an arbitrary sequence to amplify DNA by the polymerase chain reaction (PCR) (Williams et al., Nuc. Acid Res. 18:6531, 1990; U.S. Pat. No. 5,126,239). Because the primers are not designed to amplify a specific sequence, the technique is called randomly amplified polymorphic DNA or RAPD. The primers used are at least seven nucleotides in length and typically are between nine to thirteen bases with a G+C content of between 50-80% and no palindromic sequences. Under the proper conditions, differences as small as a single nucleotide can affect the binding of the primer to the template DNA, thus resulting in differences in the distribution of amplification products produced between genomes. The RAPD method of identifying polymorphisms is faster and more readily adaptable to automation than is RFLP. A limitation on the use of RAPD is the random nature of the primers used, so that maximal coverage of the genome is not assured. To the contrary, the random nature of the amplification, raises that possibility that large portions of the genome will not be amplified so that useful areas of polymorphism may go undetected.
Yet another method for identifying and mapping genetic polymorphisms has been termed amplified fragment length polymorphism or AFLP (U.S. Pat. No. 5,874,215). AFLP combines the use of restriction enzymes as in RFLP with the use of PCR as in RAPD. Briefly, restriction fragments are produced by the digestion of genomic DNA with a single or a pair of restriction enzymes. If a pair of enzymes is used, enzymes are paired based on differences in the frequency of restriction sites in the genome, such that one of the restriction enzymes is a "frequent cutter" while the remaining enzyme is a "rare cutter." The use of two enzymes results in the production of single and double digestion fragments.
Next, double stranded synthetic oligonucleotide adaptors of 10-30 bases are ligated onto the fragments generated. Primers are then designed based on the sequence of the adapters and the restriction site. When pairs of restriction enzymes are used, nucleotides extending into the restriction sites are added to the 3' end of the primers such that only fragments generated due to the action of both enzymes (double cut fragments) are amplified. Using this method, any polymorphism present at or near the restriction site will affect the binding of the primer and thus the distribution of the amplification products. In addition, any differences in the nucleotide sequence in the area flanked by the primers will also be detected. AFLP allows for the simultaneous co-amplification of multiple fragments. AFLP, however, is a multiple step process and detection of polymorphism, however, is limited to areas at or between restriction sites.
Another method to detect polymorphism is based on the high degree of length variation of certain tandemly repeating nucleotide sequences in most, if not all, eukaryotes variously called simple sequence repeats (SSR), simple sequence length polymorphisms (SSLP), dinucleotide, trinucleotide, tetranucleotide, or pentanucleotide repeats, and microsatellites. Various workers have shown that microsatellites can be useful in detecting polymorphisms (See, Morgante et al. EP 0804618 B1 and references cited therein). In this method, primers are designed based on the sequence of the microsatellite sequences. In one variation, the primers contain 2 to 4 additional nucleotides that flank the microsatellite sequence in order to anchor the primers to a particular site at each microsatellite locus. Although microsatellite-directed primers are highly effective at detecting polymorphism, they cannot be used in prokaryotes which lack microsatellite DNA.
Jensen et al., U.S. Pat. No. 5,753,467, discloses a method for the identification of microorganisms by amplification of variable regions within ribosomal DNA sequences. Briefly, primers are used the bind to highly conserved regions that flank variable regions within rDNA sequences. Size differences in the amplification products generated are then used to identify the microorganism. Although useful for identification purposes, the method provides no information about the rest of the genome and so is of limited use in genome mapping and genetic linkage studies.
Recently, the elucidation of the complete genome sequence of Escherichia coli K-12, (Blattner et al., Science 277:1453, 1997) has revealed the presence of a series of over represented (500-900 occurrences) oligonucleotide sequences within the genome. By over represented it is meant that the oligonucleotide sequences are present at a greater frequency that would be expected statistically. These oligonucleotides have been found to be biased (skewed) to the leading strand of DNA. The most frequent oligonucleotides in the leading strand consist of octamers, most containing the trimer CTG, often within the pentamer GCTGG. Although there is no direct proof, the complements of these oligonucleotides have been suggested to serve as priming sites for DNA replication based on their spacing and the presence of the possible DnaG primase binding site CAG.
The Applicant has used these over represented oligonucleotide sequences to invent a novel method to detect polymorphisms in the genome of an organism. The method makes use of these over represented oligonuclotides as priming sites for amplification of DNA by the polymerase chain reaction. The method of the present invention has several advantages over the prior art. The areas amplified are spread throughout the entire genome, thus detection of polymorphism is not restricted to a limited area. Surprisingly, it has been found that a single reaction can result in coverage of approximately 5% of the total E. coil genome. Thus, it is possible to cover the entire genome of an organism such as E. coli with as few as 20 primer pairs. The primers used are rationally designed based on the over represented oligomers, thus limiting non-specific amplification. In addition, the method does not require the use of restriction enzymes or adaptors making it technically less cumbersome.