Escherichia coli is a common enteric bacterial strain that has both laboratory and human health importance. One particular strain of E. coli, designated O157:H7 is a human enteric pathogen that causes acute hemorrhagic colitis. Young children and the elderly are particularly susceptible to disease caused by this bacteria, which is usually contracted by eating contaminated food such as undercooked meat. In the most vulnerable patients, the colitis frequently develops into hemolytic uremic syndrome (HUS), a condition that is often fatal. The disease has a very rapid progression, and is consequently very difficult to treat. Often, patients are severely ill by the time the disease is diagnosed. Once a diagnosis has been made, appropriate antibiotics may be administered to kill the infective bacteria. Sometimes, however, by the time a diagnosis has been rendered, toxic proteins secreted by the bacteria have damaged mucosal cells and entered the blood stream. Recently, clinical isolates of O157:H7 have been found to exhibit resistance to an increasing spectrum of antibiotics, which will further complicate treatment.
The source of the bacteria in the several recent cases of disease caused by this organism were traced to hamburgers purchased in fast food restaurants. The bacteria are extremely proficient at establishing an infection; ingestion of as few as 10 live bacteria is sufficient to establish an infection. The highly infective nature of O157:H7 and the devastating sequelae associated with infection by this bacteria, together with the extensive public attention given to outbreaks of hemorrhagic colitis, has generated a great deal of interest among medical professionals and the general public in developing the means for early diagnosis and treatment of the disease. Farmers are desirous of an effective treatment of infections in cattle and pigs, which are the main reservoirs for E. coli enteric pathogens. The ability to diagnose and treat livestock infected by this organism will prevent the loss of livestock and the transmission of the organism from animals to humans. Meat suppliers and those in the food industry are very much interested in a means for detecting the organism in tainted meat. Because the infective dose of O157:H7 is extremely low, a highly sensitive test is needed to identify contaminating organisms in food.
Modern geneticists have been working to resolve the genetic code of many organisms. Efforts to sequence the human genome are ongoing. The effort to sequence the genomes of whole organisms began with an effort to sequence the genome of E. coli. For the original effort to sequence the E. coli genome, a useful and common laboratory strain, designated K12, was chosen. The entire genome of that strain was sequenced and published. Science, 277:1453-1462 (1997). Since the genes which are responsible for the pathogenicity of E. coli 0157:H7 are missing from strain K12, the sequence of the K12 genome is of limited help in developing tools to detect, hinder or destroy E. coli 0157:H7.
Some efforts have been directed toward the sequencing of specific genes from 0157:H7. U.S. Pat. No. 5,798,260 describes the sequence of one specific gene, named adhesion, from that genome. The development of additional sequence information from E. coli 0157:H7 would be needed for comprehensive efforts at detection, diagnosis, prophylaxis and therapeutic approaches to infections caused by the organism.
It is an object of the invention to provide essentially the entire sequence of E. coli 0157:H7 to enable detection, diagnosis, prophylaxis and therapeutic tools to combat bacterial infections.
It is another object of the present invention to provide a means to detect low numbers of Escherichia coli O157:H7 in a contaminated food source.
It is yet another object of this invention to provide a means for the early diagnosis of humans and livestock infected with O157:H7.
Another object of the present invention is to provide a means of treating humans and livestock infected with 0157:H7.
It is a further object of the present invention to provide a means for the prevention of infection by O157:H7.
The present invention includes many DNA sequences that are unique to E. coli O157:H7.
One aspect of the present invention is a DNA sequence comprising an open reading frame (ORF), designated 03169, that encodes a putative cytotoxin 3169 amino acids in length that resembles the clostridial cytotoxins ToxA and ToxB of C. difficile and cytotoxin L of C. sordelli. 
Another aspect of the present invention is a DNA sequence that constitutes an urease gene cluster.
A third aspect of the present invention is a chromosomal gene that encodes a toxin related to the RTX family of cytotoxins and associated transport proteins.
Another aspect of the present invention are genes that are found in the Locus of Enterocyte Effacement (LEE), a 45-kb cluster of genes that are involved in the attachment of pathogens to intestinal epithelial cells and other related functions necessary to establish infection.
Another aspect of the present invention is a hypothetical serine/threonine kinase (stk) encoded by phage 933W, a lysogenic bacteriophage found in O157:H7.
The present invention is also a putative tail fiber gene, which is found on phage 933W.
Another aspect of the present invention is a method for detecting E. coli O157:H7 and distinguishing the strain from other strains of E. coli by genetic analysis and testing.
It is a feature of the invention disclosed here that virtually the entire genome of E. coli 0157:H7 is set forth in the data contained here, combined with the information already published in the field.
Not applicable.
The investigators here have sequenced virtually the entire genome of E. coli 0157:H7. Presented in this specification is essentially all the DNA sequence which is contained in strain 0157:H7 and not found in the prviously sequenced E. coli strain K12. The genome sequence is essentially complete, lacking only an occasional presumably small sequence linkage between established long sequences known. The availability of the sequence data presented here will enable intelligent design of diagnostic detection, prophylaxis and therapeutic tools for disease and infections caused by this organisms.
The sequence of E. coli 0157:H7 was, in brief, performed by shotgun cloning in the M13 Janus vector (Burland et al. Nucl. Acids Res. 21:3385-3390 (1993)). Genomic DNA was prepared for library construction by nebulization, end-repair and size fractionation as described in Mahillon et al. Gene 223:47-54 (1998). Recovered DNA fragments were ligated into the M13 Janus vectors. Library subclones were picked as plaques, from which template DNAs were prepared and then sequenced by Prism-terminator Cycle Sequencing chemistry and analyzed on ABI377 automated sequencers. Sequences were assembled by the Seqman II program (DNASTAR), and finishing employed a combination of PCR and primer walking techniques. Open reading frames were identified and analyzed as described in Blattner et al. Science 277:1453-1474 (1997). All the sequences presented in this specification are unique to strain 0157:H7 as compared to strain K12. This sequence data, when combined with the sequence of K12, resolves all of the genetic sequence of 0157:H7. The information on the K12 sequence, contained in Science, 277:1453-62 (1997) is hereby incorporated by reference as if set forth in full herein.
An important analysis which has been begun on this sequence data is the identification of genetic sequences associated with the pathogenesis of infection, which sequences provide information essential to the diagnosis, treatment, and prevention of infection by that organism. In order to facilitate the identification of genes involved in the pathogenesis of infection by enterohemorrhagic E. coli (EHEC) for use in detection of the pathogen, and in the diagnosis, treatment, and prevention of enterohemorrhagic infections, the entire genomic DNA sequence of E. coli O157:H7 serovar EDL933 (ATCC 43895) was determined and compared with that of E. coli K-12, a nonpathogenic laboratory strain, as described in detail in the examples below.
Surprisingly, the genome of O157:H7 was found to be more than one million base pairs larger than that of K-12 and to have up to 1000 genes not found on K-12. These additional gene sequences are distributed throughout more than 250 sites in islands, with each island containing from zero to sixty genes. An unexpected finding is that many of the new genes resemble virulence determinants from a wide variety of pathogens, ranging from Helicobacter pylori to Clostridium difficile. Numerous sequences of interest were identified in the genome of E. coli O157:H7, including chromosomal, plasmid, and phage sequences.
Attached to this patent application is a sequence listing containing essentially all of the DNA sequence of the regions in the O157 genome that are not present in the K12 genome. This sequence is present in the sequence listing as SEQ ID:NO 1 through SEQ ID:NO 255. These sequences correspond to the 255 islands of O157 DNA that did are not found in K12 DNA. Each of those islands has been assigned an identification (an OZ identification, meaning in the O157 zone of the genome, as opposed to a K12 region). In the description of each of the sequences is a base pair listing of where that particular sequence is found in the underlying backbone of the entire O157 sequence.
Also included in this patent application are two tables intended to make available some of the genetic analysis which has been done on these sequences. Table 1 simply itemizes the 0157 islands by OZ number and lists some of the presently known noteworthy features in the sequenced islands. Table 2 is a listing of the open reading frames (ORFs) identified in each of the OZ sequences. Where the open reading frames have been matched to putative function, such an indication is found matched to the open reading frame in Table 2. The protein encoding by each such ORF can be determined by appropriate conversion of the open reading frame DNA sequence to protein amino acid sequence using the genetic code.
By definition, the genetic material in the OZ sequences described here are sufficient for pathogenocity in humans, since strain O157 is highly pathogenic while K12 is not. In addition, analysis of the open reading frames and computer comparisons to sequences from other pathogens had allowed identification of several of the open reading frames which code for proteins specifically associated with pathogenicity.
A gene encoding an unusually large ORF, designated o3169, which putatively encodes a 3169 amino acid protein, was identified on plasmid pO157, a 92 kb plasmid resident in E. coli O157:H7 in an autonomously replicating form. Data base searches revealed that the deduced amino acid sequence of this putative ORF is similar to the large clostridial cytotoxins, ToxA and ToxB of C. difficile (Dove et al., Infect. Immun 58:480-488 (1990); von Eichel-Streiber et al., Mol. Gen. Genet. 233:260-268 (1992)) and cytotoxin L of C. sordelli (Green et al. Gene 161:57-61, 1995). ToxA, ToxB, and cytotoxin L are 2710, 2366, and 2338 amino acid residues in length, respectively. The sequences of o3169 and its putative translation product are shown in SEQ ID NO:256 and SEQ ID NO:257, respectively.
The clostridial cytotoxins are homologous proteins with three domains (von Eichel-Streiber et al., Trends in Microbiology 4: 375-382 (1996)). The N-terminal region contains a catalytic domain, a glucosyltransferase that acts on small GTP-binding proteins to interfere with their function in the organization of cytoskeletal actin filaments (Just et al., Nature 375: 500-503 (1995)). The central region contains a translocation domain that directs the secretion of the toxin, and the C-terminal region contains a target binding site.
The amino acid identity between the translation product of o3169 and the known cytotoxins is relatively weak (20%) over 444 amino acids, to ToxB. However, the alignment of these sequences is striking. The region having the highest level of amino acid identity is the first (N-terminal) 700 amino acids, which corresponds to the catalytic site of the clostridial toxins.
ToxA damages intestinal mucosal cells, and when ToxA is present, ToxB gains access to the cells underlying the mucosa, causing further damage. By analogy, the putative cytotoxin encoded by o3169 may contribute to the damage to mucosal cells observed in enterohemorrhagic E. coli infections by acting alone or in concert with some other factor to destroy submucosal tissue, thereby causing or exacerbating the acute symptoms of infection. Therefore, the putative toxin is a promising target for treatment of persons infected by enterohemorrhagic E. coli. The administration of an antibody raised against the newly discovered toxin, or a portion of the toxin, could provide an effective treatment of severe symptoms of infection. The administration of the antitoxin could be used in conjunction with antibiotic therapy. Treatment with antibiotics is effective in controlling the infection itself. However, antibiotic therapy alone is ineffective in preventing or alleviating symptoms of the disease if the antibiotic is not administered in time to prevent the production of the toxin.
A cluster of seven genes very similar to the urease genes of numerous other bacterial pathogens has been identified on the chromosome of E. coli O157:H7 and its sequence determined (SEQ ID NO:258). Urease, or urea amidohydrolase, catalyzes the hydrolysis of urea to yield ammonia and carbamate. Expression of the urease genes in urogenital and gastroenteric bacteria is important in pathogenesis. For example, formation of ammonia by cell-surface bound urease in Heliobacter pylori is thought to cause a localized increase in pH allowing survival of the bacteria in the harshly acidic environment of the host""s gastric system. In the gastrointestinal pathogens Yersinia enterocolitica and Morganella morganii, urease was found to be activated by low-pH conditions (Young et al., J. Bacteriology 178: 6487-6495 (1996)). The urease isolated from Vibrio parahaemolyticus, which causes gastroenteritis and traveler""s diarrhea, has been found to cause intestinal fluid accumulation in suckling mice (Cai and Ni, J. Clin. Lab. Analysis 10(2): 70-73 (1996)). The presence of urea has been found to enhance intracellular survival of urease-positive Bordatella bronchiseptica, a mammalian respiratory pathogen (McMillan et al., Microbial Pathogenesis 21(5): 379-394 (1996)).
Gene order is conserved among most known bacterial urease clusters (Neyrolles et al., J. Bacteriol. 178(9): 2725 (1996)) and the urease gene cluster of E. coli O157:H7. The urease gene cluster begins with ureD, an accessory gene involved with regulation, followed by three structural genes, ureA, ureB, and ureC, and three accessory genes, ureE, ureF, and ureG. The latter three genes are believed to be involved in nickel metallocenter biosynthesis (Moncrief and Hausinger, J. Bacteriol. 178(18): 5417-5421 (1996)). In-frame stop codons prematurely terminate ureD (24 amino acids) and ureE (four amino acids) relative to the C-termini shared by most protein database entries. All 7 ORFs have from 70-96% identity to genes from Klebsiella aerogenes (Mulrooney and Hausinger, J. Bacteriol. 172(10): 5837-5843 (1990)). It is of interest that the sequence of strain EDL933 is more similar to the urease cluster of Klebsiella than that of E. coli strain 1440, the only other E. coli urease sequence presently available (D""Orazio and Collins, J. Bacteriol. 175(6): 1860-1864 (1993)). For example, ureD of EDL 933 is 71% identical to the corresponding gene from Klebsiella aerogenes, but only 47% identical to the plasmid borne ureD gene of E. coli 1440.
Some strains of E. coli, particularly the uropathogens, test positive for urease activity, but EDL933 does not. A urease-positive mutant strain of O157:H7 observed among U.S. clinical isolates (Hayes et al., J. Clin. Microbiol. 33(12): 3347-3348 (1995)) may reflect the activation of a cryptic operon. Alternatively, the urease-positive mutant strain of O157:H7 may reflect a regulatory change for an already functional operon. Analogously, urease-negative Y. pestis exhibits a urease gene complex very similar to those of urease-positive members of the Yersinia genus. (de Koning-Ward and Robins-Browne, Gene 182(1-2): 225-228 (1996)).
The urease gene cluster has potential utility in vaccine development. For example, the whole gene cluster could be genetically engineered into an attenuated strain to be used as a vaccine. The urease gene cluster would enhance the ability of the vaccine strain to survive the acid environment of the stomach.
An O157:H7 chromosomal gene cluster related to the RTX family of cytotoxins was identified as described in the examples below. The RTX cytotoxins are a group of exotoxins produced by Gram-negative bacteria that share the properties of secretion by a leader-independent pathway and a tandemly repeated sequence nine amino acids in length that is responsible for calcium binding (Welch et al., FEMS Microbiol. Immunol. 5: 29-36 (1992)). RTX toxins recognize a beta2 integrin on the surface of host cells (Lally et al., J. Biol. Chem. 272: 30463-30469 (1997)). Known members of the family include apxIA, apxIIA, and apxIIIA from Actinobacillus pleuropneumoniae, cyaA from Bordetella pertussis, frpA from Neisseria meningitidis, prtc from Erwinia chrysanthemi, hlyA and elyA from Escherichia coli, aaltA from Actinobacillus actinomycetemcomitans, and lktA from Pasteurella haemolytica. Hybridization studies using probes designed from sequences of these known toxins identified potential RTX toxin genes in several pathogenic bacterial species for which no RTX toxins were previously known, indicating that RTX or RTX-like toxins are widely distributed among pathogenic gram-negative bacteria (Kuhnert et al., Appl. Environ. Microbiol. 63: 2258-2265, (1997)). The novel O157:H7 toxin locus (SEQ ID NO:259 and SEQ ID NO:260) comprises three genes, including the putative toxin and two proteins involved in transport. The gene order and sizes are consistent with other RTX loci. The putative toxin gene shows only marginal similarity to other RTX toxins, and the match is limited to the glycine-rich calcium binding repeat region. The more highly conserved transport genes are approximately 40% identical to related genes found in other RTX clusters.
The Locus of Enterocyte Effacement (LEE) is a 45 kb cluster of genes involved in intimate adherence of pathogens to intestinal epithelial cells, initiation of host signal, transduction pathways, and formation of attaching and effacing lesions (McDaniel et al., Proc. Natl. Acad. Sci. 92: 1664-1668, (1995); McDaniel and Kaper, Molecular Microbiology 23: 399-407 (1997)). Colony hybridization studies indicate that sequences homologous to the entire element are found in numerous enteropathogenic E. coli (EPEC), enterohemorrhagic E. coli (EHEC), and other related bacteria (McDaniel et al., Proc. Natl. Acad. Sci. (1995)). The O157:H7 LEE sequence is shown in SEQ ID NO:261. Sequence data is currently available for the entire LEE from another EPEC strain E2348/69. Comparisons of the O157:H7 LEE to that of E2348/69 revealed that although many genes were nearly identical between the two strains, other genes are markedly variable. This variability is nonrandom with respect to gene function, in that all proteins known to be exported to the extracellular environment are variable, whereas those proteins known to constitute the secretion machinery are invariant. A similar observation has been made based on comparisons of the inv-spa complex of Salmonella enterica (Boyd et al. J. Bacteriol. 179: 1895-1991 (1997)).
Four contiguous LEE genes, L0027, L0028, L0029, and L0030, have been selected for their diagnostic potential. A comparison of these genes from EDL933 and the corresponding genes in E2348/9 revealed significant differences between the strains: L0027 (33.52% difference); L0028 (17.48%); L0029 (21.94%); and L0030 (25.30%). In E2348/69, the homolog of known L0027 is known as tir (B. Kenny et al., Cell 91: 511-520). The tir gene encodes a product that is translocated from the bacterium to the host cell where it serves as the receptor for intimin, another LEE encoded gene product. Little is known about the function and role in pathogenesis of the other three hypervariable virulence LEE genes. The L0028 gene product shows 27% identity with a hypothetical protein encoded in a Shigella virulence-associated cluster (Elliot et al, Mol. Mircro., in press). The deduced amino acid sequence of the L0027 gene shows slight similarity to a secreted protein in the plant pathogen Erwinia amylovora, and the L0028 translational product has 27% identity with a hypothetical protein encoded in a Shigella virulence-associated cluster (Elliot et al. Mol. Micro., in press).
The entire toxin-converting phage 933W was sequenced as described in the examples. Two novel gene sequences with potential diagnostic and therapeutic value were identified. The first is an ORF (SEQ ID NO:262) that encodes a protein (SEQ ID NO:263) that resembles members of the eukaryotic family of serine/threonine kinase (stk). The amino acid sequence similarities span the conserved regions in the catalytic domain of the eukaryotic protein kinases, including both the ATP binding and active site patterns as described in the PROSITE database. BLASTP, FASTA, and DeCypher II searches with the stk sequences all yield much higher scoring matches to the eukaryotic serine/threonine protein kinases than do searches with the YpkA protein of Yersinia pseudotuberculosis and Y. enterocolitica. There is some suggestion that the Yersinia protein kinase is involved in virulence, by interfering with the signal transduction pathway of the mammalian host, and bacteriophage 933W may interfere with the host systems in the same manner.
Shiga-like toxins, which are encoded by lysogenic bacteriophages, are considered to be one of the major pathogenic features of enterohaemorrhagic E. coli strains. Toxin genes have been previously sequenced in 933W. Based on the arrangement of the 933W and our knowledge of phage organization, we postulated that shiga-like toxin genes are xe2x80x9clate genesxe2x80x9d the expression of which is controlled by a homologue of the Q gene of bacteriophage lambda. If in fact the shiga-like toxin genes are late genes, the toxins would be expressed only during a lytic infection. Bacterial cells already carrying the prophage would be immune to super-infection by the phage released during a lytic phase of growth. However, non-lysogens in the vicinity could be infected and produce additional phage and toxin in what can be envisioned as an amplification by recruitment. Thus any late gene product could be an indicator of a condition under which toxin production would be increased. We have identified a putative tail fiber gene in the bacteriophage sequence. The coding region and the deduced amino acid sequence of the gene are shown in SEQ ID NO:264 and SEQ ID NO:265, respectively. The phage tail fibers, expressed by late genes, are required for infection of new bacterial hosts during the lytic phase of growth. Therefore, antibodies to this protein could serve as a diagnostic, as well as a therapeutic to prevent the proposed recruitment and infection of other bacterial cells.
Several other putative pathogenocity genes have been identified as well. The following is a list of additional pathogenic genes by reference to the segment (OZ number) and to the open reading frame (F0 number) in which the sequence for that pathogen gene is presented below.
Toxins
OZIDxe2x80x94175 (F1037) homolog of Shigella SHeT
OZIDxe2x80x94175 (F1041, F1042) putative Clostridium difficile ToxAB-like cytotoxin
OZIDxe2x80x9411 (F0027) Legionella pneumophila IcmF-like homolog
Fimbriae
Attachment to the host:
OZIDxe2x80x94197 (F1133-F1139)
OZIDxe2x80x94215 (F1231-F1237)
Both encode six proteins most similar to those of the lpfABCDE locus of Salmonella.
Iron Utilization
Iron is complexed in hemoglobin in the host so that infecting bacteria need efficient systems to actively acquire iron.
OZIDxe2x80x94196 (F1124-F1132) chu locus homologous to the shu locus of Shigella dysenteriae 
OZIDxe2x80x9429 (F1132-F1134) homologous to the Actinobacillus pleuropneumoniae afuABC locus
OZIDxe2x80x94176 (F1054-1059) a putative siderophore receptor and associated proteins plus one hypothetical protein, similar to FecD, FecC (both permeases) and FecE (ATP-binding protein) of Synechocystis.
OZIDxe2x80x9478 (F0527 F0528), tonB-dependent outer membrane receptor, and ABC transporter
OZIDxe2x80x9459 (F0294) putative exogenous ferric siderophore receptor similar to R4
OZIDxe2x80x9462 (F0360) Salmonella IroE-like protein,
Phage Encoded
OZIDxe2x80x9498 (F0630) and OZIDxe2x80x94139 (F0825), putative superoxide dismutases, potentially giving protection from oxidative stress
Metabolic Capabilities
These may contribute to the ability to grow in the intestinal environment.
OZIDxe2x80x9426 (F0125-F0131) high affinity ribose transport system
OZIDxe2x80x9429 (F0135-F0137) hexose-phosphate transport
OZIDxe2x80x9449 (F0167-F0179) includes glutamate fermentation, fumarase, ATP-transferase
OZIDxe2x80x9459 (F0256-F0262) urease
OZIDxe2x80x9462 (F0394-F0409) fatty acid/polyketide biosynthesis, interspersed unknowns
OZIDxe2x80x94151 (F0921-F0925) sucrose utilization, D-serine permease part of 12 kb island
OZIDxe2x80x94156 (F0932-F0934) DMSO reductase-anaerobic
OZIDxe2x80x94193 (F1097-F1113) fatty acid biosynthesis
OZIDxe2x80x94194 (F1115-F1123) PTS and sugar modification enzymes, substrate unknown
OZIDxe2x80x94232 (F1266-F1272) PTS sorbose
OZIDxe2x80x94233 (F1274-F1282) sugar (ribose) transport, modification
One wishing to practice the present invention using one of the disclosed sequences could do so by isolating the sequence from ATCC 43895 using knowledge of the nucleotide sequence and standard methods known to one of skill in the art.
It is expected that minor sequence variations in E. coli O157:H7-specific nucleotide sequences associated with nucleotide additions, deletions, and mutations, whether naturally occurring or introduced in vitro, would not interfere with the usefulness of these sequences in the detection of enterohemorrhagic E. coli, in methods for preventing EHEC infection, and in methods for treating EHEC infection. Therefore, the scope of the present invention is intended to encompass minor variations in the claimed sequences.
By an E. coli O157:H7-specific nucleotide probe it is meant a sequence that is able to hybridize to E. coli O157:H7 target DNA present in a sample containing E. coli O157:H7 under suitable hybridization conditions and which does not hybridize with DNA from other E. coli strains or from other bacterial species. It is well within the ability of one skilled in the art to determine suitable hybridization conditions based on probe length, G+C content, and the degree of stringency required for a particular application.
The probe may be RNA or DNA. Depending on the detection means employed, the probe may be unlabeled, radiolabeled, or labeled with a dye. The probe may be hybridized with a sample that has been immobilized on a solid support such as nitrocellulose or a nylon membrane, or the probe may be immobilized on a solid support, such as a silicon chip.
The sample to be tested may include blood, urine, feces, or other materials from a human or a livestock animal. Alternatively, the sample may include food intended for human consumption. The sample may be tested directly, or may be treated in some manner prior to testing. For example, the sample may be subjected to PCR amplification using appropriate oligonucleotide primers.
Any means of detecting DNA-RNA or DNA-DNA hybridization known to the art may be used in the present invention.
Also presented in this specification is a series of sequence listings constituting the entire sequence of all portions of the genome of E. coli 0157:H7 that do not appear in strain K12. These sequences are presented as SEQ:ID:NO:1 to SEQ:ID:NO:255 below. Since all of these sequences are diagnostic of 0157:H7, as compared to K12, sequence information from any of these sequences can be used to design diagnostic probes useful to distinguish strain 0157:H7 from strain K12 using molecular techniques. To have reasonable assurance of success under conditions of variable stringency, it is preferred that such diagnostic probes use sequences which are at least 25 nucleotides or longer in length. Any 25-mer selected from amongst any of the sequences in any of SEQ:ID:NO:1 through SEQ:ID:NO:255 may be used for such a probe.