This invention relates generally to determining the genotype of organisms by hybridization analysis and, more specifically, to establishing the relatedness of individual organisms within a species.
A genotype is the genetic constitution of an individual or group. Variations in genotype are essential for commercial breeding programs, diagnostics, monitoring genetic-based diseases, following spread of pathogens, determining parentage, and the like. While determining the nucleic acid sequence of genomic DNA is one way to unambiguously establish a genotype of an individual, it is not currently practicable to accomplish, especially in organisms with complex genomes.
Genotypes can be more readily described in terms of genetic markers. A genetic marker identifies a specific region or locus in the genome. Thus, the more genetic markers, the finer defined is the genotype. A genetic marker becomes particularly useful when it is allelic between organisms because it then may serve to unambiguously identify an individual.
Many different flavors of genetic markers have been described and exploited, but all are based upon genomic sequence. Examples of methods to define genetic markers include restriction fragment length polymorphism (RFLP) analysis (Botstein et al., Am J Hum Genet 32: 314, 1980); single-sequence repeats (SSR) analysis (Weber and May, Am J Hum Genet 44: 388, 1989; U.S. Pat. No. 5,874,215); rapid-amplified polymorphic DNA (RAPD); amplified fragment length polymorphism (AFLP) (Vos et al., Nucleic Acids Res 23: 4407, 1995); 5xe2x80x2 nuclease amplifications (U.S. Pat. No. 5,962,233); nucleic acid indexing (U.S. Pat. No. 5,994,068; Guilfoyle, et al., Nucl Acids Res, 25; 1854, 1997; Unrau and Degau., Gene 145: 163, 1994; U.S. Pat. No. 5,508,169) arbitrarily-primed nucleic acid amplification (U.S. Pat. No. 5,413,909; U.S. Pat. No 5,861,245); restriction enzyme amplification display system (READS) (U.S. Pat. No. 5,712,126; Prashar and Weissman, Proc Natl Acad Sci USA 93: 659, 1996); consensus sequence primed polymerase chain reaction (CP-PCR) (U.S. Pat. No. 5,437,975); hybridization-based genetic amplification (WO 98/0721); and the like.
All of these genotyping methods suffer from the laborious requirement to analyze only a single organism at a time. A further burden in some of these methods is the need for pre-identification of a polymorphism before analysis of other individuals (U.S. Pat. No. 6,100,030). Still others of these methods depend upon expensive materials and time-intensive gel electrophoresis, resulting in a low-throughput. Furthermore, these methods that base identity on size suffer from additional difficulties in precisely correlating bands on gels with alleles. One method has attempted to overcome many of these restrictions by performing analysis by hybridization to nucleic acids immobilized on solid-state surfaces (U.S. Pat. No. 6,100,030). In this technique however, a genotype of an organism is not established. Rather, the analysis yields information regarding a pre-determined polymorphism.
The ability to assign a comprehensive genotype for an individual without requiring sequence information, existing knowledge of polymorphisms, or having to compare lengths is paramount to the mass of genetic information necessary for breeding, disease analysis, and so forth. Such systems and analyses also demands a high-throughput for optimal and maximal benefit.
The present invention discloses methods and compositions for performing high throughput genotype determinations by basing analyses on hybridization of unselected nucleic acids to genomic nucleic acids immobilized to solid state materials, and further provides other related advantages.
The present invention relates to methods and compositions for determining and relating genotypes of organisms. Within one aspect of the present invention, a nucleic acid molecule that contains a polymorphism is identified. Two organisms are selected, one may be referred to as a reference organism and the other may be referred to as the tester organism. Nucleic acids from each of these organisms are separately amplified. Amplified material from the tester organism is cloned or otherwise separated (by e.g., gel electrophoresis, HPLC), and individual clones or separated amplified material is placed into an addressable array. The amplified material from the reference organism, which contains a detectable label is hybridized to the array. Clones on the array that do not evidence detectable hybridization are thus identified as containing a polymorphism.
In a second aspect, the genotype of an organism is determined. In this method, nucleic acids from two or more organisms are pooled and used to generate a first diversity panel. In one embodiment, the diversity panel is generated by amplification. In other embodiments, the diversity panel is generated by restriction enzyme digestion, a combination of amplification and restriction digestion, or other means that creates a reproducible pattern. The first diversity panel is then separated on the basis of sequence or molecular weight, e.g., by cloning, gel electrophoresis, HPLC, or the like, and individual elements of the diversity panel, e.g., clones, are placed into an addressable array. Nucleic acids from another organism, which may be one of the organisms in the initial pool, the selected organism, is used to generate a second diversity panel.
In one aspect, the polymorphisms detected are caused by insertion elements, such as transposons. The diversity panels are generated by amplification, and in some embodiments amplification in conjunction with restriction enzyme digestion and ligation of adapters. Amplification is performed with a primer pair in which one of the primers anneals to a sequence found in a family of insertion elements.
In certain embodiments, the first and second diversity panels are generated by the same technique and using the same primers, enzymes, or methods. In other embodiments, the techniques differ, and in yet other embodiments, the techniques are the same but the primers or enzymes used to generate the two diversity panels are different.
In a preferred embodiment, the second diversity panel contains a detectable label, such as a fluorochrome, chemiluminescent molecule, radiolabel, enzyme, ligand, and the like.
The array is then hybridized with the second diversity panel. A pattern of hybridization to the array is established. The genotype of the selected organism is thus determined. Briefly, the more elements of the array that hybridize with the diversity panel of the selected organism, the more related the selected organism is to the organisms constituting the array. By generating a diversity panel from each of the organisms in the pool and hybridizing them individually to the array, the genotypes and the relatedness of all the organisms can be determined.
In a third aspect of this invention, a first diversity panel is generated and placed onto an array as described for the second aspect. The array will thus comprise the genomes of two or more organisms. A second diversity panel is generated from a selected organism, that may or may not be represented in the first diversity panel. The second diversity panel is hybridized to the array, and a pattern of hybridization is detected. The genotype of the selected organism is established.
In one embodiment, a third, fourth, and so on diversity panels are generated from individual organisms and mixed with the second diversity panel. In this embodiment, the second, third, and so on diversity panels contain a detectable label, and each diversity panel contains a label distinguishable from the others. The more labels that can be distinguished, the more diversity panels that can be mixed together. In certain embodiments, the labels are fluorochromes or mass-spectometry tags. The mixture of diversity panels is hybridized to the array, and a pattern of hybridization with each diversity panel is detected. The genotypes of the selected organisms are thus determined from the patterns of hybridization.
In a preferred embodiment, genomic nucleic acids from two or more organisms are digested with a restriction enzyme. The restriction enzyme may be an enzyme sensitive to methylation. In such a case, the polymorphisms detected are modifications (methylation) of bases. In one embodiment, fragments are selected on the basis of size to comprise a pool of fragments in a desired size range. The digested fragments are cloned into a vector and placed into an addressable array on a solid surface, such as a glass slide. Another organism whose genotype is to be determined (called here organism X), and which may or may not be the same organism as one in the first group, is digested with the same restriction enzyme. These restriction fragments are amplified. Typically, adapter sequences are ligated to the fragments and also used as primers for amplification. The amplified fragments are also labeled with one of the labels described below. Labeled fragments are hybridized to the addressable array, nonhybridized fragments are washed off, and the array is then analyzed for the label. In this way a pattern of hybridization is obtained. That pattern is the genotype of the organism X. In this example, when an element in the array hybridizes, it indicates that the organisms share sequence similarity for that fragment. When an element in the array does not hybridize, it indicates a polymorphism. In this particular example, the polymorphism is analogous to a restriction fragment length polymorphism and arises because the restriction fragment in organism X is too long to be amplified or too short to hybridize.
In still other aspects, kits and arrays are provided that comprise diversity panels for genotyping.
These and other aspects of the present invention will become evident upon reference to the following detailed description and attached drawings. In addition, various references are set forth below which describe in more detail certain procedures or compositions (e.g., plasmids, etc.), and are therefore incorporated by reference in their entirety.