1. Field of the Invention
This invention relates generally to the fields of population and molecular genetics. In particular, it relates to a method for identifying polymorphic markers in a population.
2. Related Art
As a general rule, the taxonomic classification of species is generally reserved for organisms that are genetically similar and capable of mating productively. Since bacteria are asexual organisms, species generally refers to populations that share genetic and biochemical similarity. Despite the fact that species of bacteria share similarity, significant diversity can be observed when comparing different populations of a given species. To illustrate, the gut bacterium Escherichia coli consists of approximately 170 different serotypes.
One of the most important tasks of a clinical or industrial microbiologist is the precise determination of what microorganism, if any, is present in a sample. Using some commonly known and simple techniques, the microbiologist can generally deduce the species of the unknown microorganism relatively quickly. However, subspecies or actual strain determination of the microorganism present in the sample frequently requires sophisticated methods of genetic or biochemical analysis. This, of course, translates into higher costs and a slower turnaround time.
Determination of a specific strain of bacteria rather than the mere species that is present in a sample is particularly important to the food industry. For example, of the approximately 170 strains of E. coli, only about 30 of them are pathogenic to humans. Depending on the pathogenic potential of strains or subspecies, processors may often elect to dump a batch contaminated with the species rather than invest time and effort in determining the precise strain or subspecies classification. This is because of the aforementioned costs associated with deducing the actual strain to determine if it is in fact pathogenic. The obvious problem with such xe2x80x9cdumpingxe2x80x9d is that it also has costs associated with it, namely lost revenues. Therefore, it is desirable to have some method of quickly identifying what strain of bacteria may be present in a sample. In order to develop diagnostic tools for the rapid identification of bacterial strains, it is first necessary to identify genetic markers which are characteristic of problematic and less problematic strains.
In addition to the practical application of strain-level classification, understanding genetic characteristics of populations of bacteria is also important for creating safer food environments. Alteration of the genome by gene acquisition, deletion, and mutation, along with new routes of transmission into the food chain, and the selective pressures that are imposed in food production environments, are the elements that drive evolution and emergence of foodborne pathogens. Thus, it increasingly important that new methods are devised for understanding how pathogenic and spoilage organisms enter the food supply, how different populations of pathogenic organisms are effected by selective pressures in food production environments, and how this relates to characteristics that confer increased virulence, spoilage, and/or transmissibility on certain populations. Several molecular genetic approaches have been developed to provide high-resolution information about populations, including random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), octamer-based genome scanning (OGBS), and multi-locus sequencing. Each of these approaches suffers from the fact that they provide only limited coverage of the genome in a single experiment and must therefore be performed in a plurality of intentions to increase genome coverage, particularly in the case of closely related strains. The present invention overcomes this limitation by allowing for coverage of the entire genome in a single experiment and by determination of genetic segments that are specific to relevant populations.
Another bacterial of particular interest to the food industry is Listeria monocytogenes. Although several serotypes of Listeria monocytogenes strains are found in foods and in the environment, most human infections ( greater than 95%) are caused by only three serotypes, 1/2a, 1/2b and 4b. These strains belong to two major genetic groups, one of which includes serotype 1/2a while 1/2b and 4b belong to the other group. Most molecular genetic and immunologic studies have used strains from the first genetic group, including 1/2a (strains 10403s, EGD, NCTC7973 Mack) and 1/2c (strains LO28). Strains representing the other group have largely been omitted from molecular genetic studies. However, strains from this group, especially strains of serotype 4b, may be of the most significance to the food industry and public health.
Strains of serotype 4b account not only for a substantial fraction (ca. 40%) of sporadic infections but also for almost all of the common-source outbreaks of listeriosis that have been studied, including the 1985 Jalisco cheese out break in Los Angeles and the latest multi-state outbreak in the United States traced to contaminated hot dogs. There is a need for a relatively quick, simple, and inexpensive method for determining unique DNA sequence information for rapidly distinguishing among different subpopulations of L. monocytogenes isolates. Such tests are crucial for high-throughput analyses necessary for epidemiological studies and risk assessment studies.
Listeria monocytogenes is a ubiquitous gram-positive organism that can cause life-threatening infections ranging from meningitis, septicemia, and fetal death. Although the incidence of listeriosis is low, the associated morbidity can be quite high, particularly in pregnant women and immunocompromised individuals (Gellin and Broome, 1989).
L. monocytogenes is well known for its robust physiological characteristics and is one of few pathogenic bacteria capable of growth at refrigeration temperatures, under conditions of low pH, and/or high osmolarity (Farber and Brown, 1990; Farber and Pterkin, 1991; Kroll and Patchett, 1992; Miller 1992; Wilkins et al. 1972). Kroll and Patchett, 1992).
L. monocytogenes can grow in several types of cultured cells and is capable of intracellular growth and spread to adjacent host cells through the use of host cell cytoskeletal components (Galliard et al. 1987; Portnoy et al. 1988; Tilney and Portnoy, 1989; Mounier et al. 1990). Genetic analysis of virulence in L. monocytogenes has identified several loci that contribute directly to the series of events that occur during host cell invasion (reviewed in Portnoy et al. 1992, Sheehan et al. 1994). These virulence genes include adhesins, a cytolytic toxin, an actin polymerizing protein and phospholipases, that function in host cell entry, vacuole escape, replication, and spread to adjacent host cells respectively.
Several signals, such as temperature and carbohydrates seem to control regulation of the virulence genes (Leimeister-Wachter et al. 1992; Park and Kroll, 1993) and recent evidence suggests that these are separate pathways that govern expression of the virulence genes (Renzoni et al. 1997). Thus, the virulence gene regulator, called PrfA, may couple transcription of the virulence genes to a variety of cues that could signal entry into a host.
L. monocytogenes strains display serotypic differences in somatic (numbered) and flagellar (lettered) antigens (Seelinger and Hoehne, 1979). Although 13 different serotypes of L. monocytogenes are found in foods and in the environment (Farber and Pterkin, 1991), most clinical isolates are of only 3 serotypes, 1/2a, 1/2b and 4b (Schuchat et al. 1991), suggesting that these serotypes may be particularly virulent for humans or are better able to survive the necessary hurdles for transmission and establishment of infection.
Several studies have been conducted to examine genetic relationships among L. monocytogenes strains. One of the most significant was an early study using Multi-Locus Enzyme Electrophoresis (MLEE), which identified 45 different electropherotypes (ETsxe2x80x94combinations of alleles or protein isomorphs) that were divided amongst two distinct genetic lineages (Piffaretti et al. 1989). Perhaps one of the more striking results from this study was the finding that nearly all of the strains isolated from large outbreaks comprised only 2 ETs, suggesting that these clones may be highly virulent for humans. In contrast to the clustering of the epidemic strains, strains isolated from sporadic cases were dispersed among many different ETs.
In addition to MLEE, investigators using pulsed-field gel electrophoresis (Brosch et al 1994), ribotyping (Graves et al. 1994), RFLP analyses of virulence genes (Vines et al. 1992), and DNA sequence analysis of virulence genes (Gutekunst et al. 1992 and Rasmussen et al. 1991) have also demonstrated the existence of two distinct lineages of L. monocytogenes strains. Recent studies of Rasmussen et al. (1995) and Wiedmann et al. (1997) using multilocus sequence analysis of different combinations of virulence-associated genes along with RFLP analyses and ribotyping independently demonstrated the existence of a third lineage of L. monocytogenes. Genetic relationships demonstrated by these methods showed that epidemic strains were confined to lineage I, sporadic strains were found in lineage I and II, while lineage III was devoid of human clinical isolates (Wiedmann et al. 1997). In fact, the genetic distinctiveness lead these authors to propose that lineage III strains are largely animal pathogens and should be designated as a new species of Listeria (Wiedmann et al. 1997). Together, these studies, which have employed several different means of genetic analysis, strongly support the notion that virulence, or physiological characteristics that facilitate survival of hurdles necessary to establish infection, are not evenly distributed among the lineages of L. monocytogenes. Studies of several different bacterial pathogens have, in fact, demonstrated that clonal expansion of highly virulent subpopulations, marked by unique combinations of virulence gene alleles, is usually associated with increased spread of disease (see, e.g. Karaolis et al. 1995, Musser and Krause, 1998, reviewed in Musser, 1996). Recently it has been shown that even within apparently clonal populations of E. coli O157:H7, divergent subpopulations exist in the U.S. and appear to have unique ecologies (Kim et al. 1999). Therefore, the phenomenon of variation in virulence potential appears to be a general characteristic of pathogenic microorganisms.
There are several possibilities, which are not mutually exclusive, that could account for differences in virulence characteristics of L. monocytogenes subpopulations. One of the simplest explanations is that the putative more virulent subpopulations carry particular combinations of virulence gene alleles that render the strains better able to penetrate host cells and tissues. In other pathogenic species, allele combinations of virulence genes appear to play an important role in the rise and spread of certain clones. Secondly, it is possible some lineages may possess additional genes that contribute to virulence or that they possess unique patterns of virulence gene expression. Strain-specific variations in the modulation of PrfA activity have recently been demonstrated with respect to carbon-source effects on prfA-dependent gene expression in different L. monocytogenes strains (Brehm et al. 1999; Huilett et al. 1999). Lastly, it is also possible that physiological difference among the lineages confers characteristics that make certain lineages better able to survive the necessary hurdles to establish infection.
A method is provided for identifying polymorphic markers in a population comprising genotypically characterizing a first sample of the population, selecting one or more individuals of the first sample based upon the characterization, fabricating a microarray with genomic DNA from each selected individual, genotyping a second sample of the population using each fabricated microarray as a reference, identifying the polymorphic markers in the population, and sorting the markers to identify those characteristic of the population of interest. In one embodiment, the population is a bacterial population. The bacterial population is selected from the group consisting of Listeria monocytogenes, Escherichia coli, Lactobacillus casei, Lactococcus lactus, Salmonella typhimurium, Salmonella entereditis, and Salmonella typhi. 
Also provided is a method for identifying polymorphic markers in a bacterial population comprising phenotypically characterizing a first sample of the population, selecting one or more individuals of the first sample based upon the characterization, fabricating a microarray with genomic DNA from each selected individual, genotyping a second sample of the population using each fabricated microarray as a reference, identifying the polymorphic markers in the population, and sorting the markers to identify those characteristic of the population of interest. In one embodiment, the bacterial population is selected from the group consisting of Listeria monocytogenes, Escherichia coli, Lactobacillus casei, Lactococcus lactus, Salmonella typhimurium, Salmonella entereditis, and Salmonella typhi. 
Also provided is a method for identifying unique bits among a plurality of bit strings including providing a plurality of bit strings wherein each bit string has the same number and position of bits and wherein each bit has a value of 0 or 1, generating a graphical representationxe2x80x94including selectable elementsxe2x80x94representing the relatedness of the bit strings, making a selection of a first selectable element, making a selection of a second selectable element, and identifying bits that are present in each bit string represented by the first selectable element and absent in each bit string represented by the second selectable element, or vice-versa. In one embodiment, the relatedness of the bit strings is determined by the commonality of bit values at corresponding positions in the bit strings. In both the method and the embodiment of the method, the graphical representation can be a dendrogram and the selectable elements can be leaves and nodes, each leaf representing a single bit string, and each node representing two or more bit strings.
Also provided is a computer readable medium having software for identifying unique bits among a plurality of bit strings, including logic configured to provide a plurality of bit strings, each string having the same number and position of bits, each bit having a value of 0 or 1, logic configured to generate a graphical representation, including selectable elements, representing the relatedness of the bit strings, logic configured to make a selection of a first selectable element, logic configured to make a selection of a second selectable element, and logic configured to identify bits that are present in each bit string represented by the first selectable element and absent in each bit string represented by the second selectable element, or that are absent in each bit string represented by the first selectable element and present in each bit string represented by the second selectable element. In one embodiment, the relatedness of the bit strings is determined by the commonality of bit values at corresponding positions in the bit strings. In both the method and the embodiment of the method, the graphical representation can be a dendrogram and the selectable elements can be leaves and nodes, each leaf representing a single bit string, and each node representing two or more bit strings.
Other systems, methods, features and advantages of the invention will be or will become apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the accompanying claims.