Central to the field of microbiology is the ability to positively identify microorganisms at the level of genus, species, or serotype. Correct identification is not only an essential tool in the laboratory, but it plays a significant role in the control of microbial contamination in the processing of food stuffs, the production of agricultural products, and the monitoring of environmental media, such as ground water. Of greatest concern is the detection and control of pathogenic microorganisms. Typically, pathogen identification has relied on methods for distinguishing phenotypic aspects, such as growth or motility characteristics, and for immunological and serological characteristics. Selective growth procedures and immunological methods are the traditional methods of choice for bacterial identification and these can be effective for the presumptive detection of a large number of species within a particular genus. However, these methods are time consuming and are subject to error. Selective growth methods require culturing and subculturing in selective media, followed by subjective analysis by an experienced investigator. Immunological detection (e.g., ELISA) is more rapid and specific, however, it still requires growth of a significant population of organisms and isolation of the relevant antigens. For these reasons, interest has turned to detection of bacterial pathogens based on nucleic acid sequence.
Nucleic acid polymorphism provides a means to identify species, serotypes, strains, varieties, breeds, or individuals based on differences in their genetic make up. Nucleic acid polymorphism can be caused by nucleotide substitution, insertion, or deletion. The ability to determine genetic polymorphism has widespread application in areas such as genome mapping, genetic linkage studies, medical diagnosis, epidemiological studies, forensics, and agriculture. Several methods have been developed to compare homogenous segments of DNA to determine if polymorphism exists.
One method for determining genetic polymorphism uses primers of an arbitrary sequence to amplify DNA by the polymerase chain reaction (PCR) (Williams et al., Nucleic Acids Res. 18:6531-35 (1990); U.S. Pat. No. 5,126,239, incorporated herein by reference). Because the primers are not designed to amplify a specific sequence, the technique is called random amplification of polymorphic DNA (RAPD) or arbitrarily primed PCR (APPCR). The primers used are at least seven nucleotides in length. Under the proper conditions, differences as small as a single nucleotide can affect the binding of the primer to the template DNA, thus resulting in differences in the distribution of amplification products produced between genomes.
Another method for identifying and mapping genetic polymorphisms has been termed amplified fragment length polymorphism (AFLP; U.S. Pat. No. 5,874,215, incorporated herein by reference). AFLP combines the use of restriction enzymes with the use of PCR. Briefly, restriction fragments are produced by the digestion of genomic DNA with a single or a pair of restriction enzymes. If a pair of enzymes is used, enzymes are paired based on differences in the frequency of restriction sites in the genome, such that one of the restriction enzymes is a “frequent cutter” while the remaining enzyme is a “rare cutter.” The use of two enzymes results in the production of single and double digestion fragments. Next, double stranded synthetic oligonucleotide adaptors of 10-30 bases are ligated onto the fragments generated. Primers are then designed based on the sequence of the adapters and the restriction site. When pairs of restriction enzymes are used, nucleotides extending into the restriction sites are added to the 3′ end of the primers such that only fragments generated due to the action of both enzymes (double cut fragments) are amplified. Using this method, any polymorphism present at or near the restriction site will affect the binding of the primer and thus the distribution of the amplification products. In addition, any differences in the nucleotide sequence in the area flanked by the primers will also be detected. AFLP allows for the simultaneous co-amplification of multiple fragments.
A further method is Direct Linear Analysis (DLA), which analyzes individual DNA molecules bound with sequence-specific tags (see Chan et al., Genome Res. 14:1137-46 (2004); U.S. Pat. No. 6,263,286, incorporated herein by reference). The method is intended to identify repetitive information in DNA, which is moved past at least one station, at which labelled units of DNA interact with the station to produce a DNA-dependent impulse. Because the extended objects are similar, or preferably identical, and comprise a similar, or preferably identical, pattern of labelled units, a characteristic signature of interactions is repeated as each extended object moves past a station or a plurality of stations. This repetitive information is extracted from the overall raw data by means of an autocorrelation function and is then used to determine structural information about the DNA.
Another method is amplification of repetitive elements (REP-PCR). This technique is based on families of repetitive DNA sequences present throughout the genome of diverse bacterial species (reviewed by Versalovic et al., Methods Mol. Cell. Biol. 5:25-40 (1994)). Repetitive extragenic palindromic (REP) sequences are thought to play an important role in the organization of the bacterial genome. Genomic organization is believed to be shaped by selection and the differential dispersion of these elements within the genome of closely related bacterial strains can be used to discriminate between strains (see, e.g., Louws et al., Appl. Environ. Micro. 60:2286-95 (1994)). REP-PCR utilizes oligonucleotide primers complementary to these repetitive sequences to amplify the variably sized DNA fragments lying between them. The resulting products are separated by electrophoresis to establish the DNA “fingerprint” for each strain.
The output data of these fingerprinting systems generally is measured by assigning band sizes, though these assignments are somewhat imprecise depending on the sizing ladder used for the comparison. In addition, the output data can be difficult to compare between laboratories and often relies on the use of expensive proprietary software programs (such as BioNumerics, Applied Maths, Austin, Tex.) to handle the data.