Transposable elements are divided into two classes: Class 1, or retro-elements include the most abundant element in plants, the long terminal repeat (LTR) retrotransposons (such as Tnt1, Opie, Huck, and BARE1) and also the long interspersed nuclear elements (LINEs, also known as non-LTR retrotransposons), and short interspersed nuclear elements (SINEs). For all Class 1 elements, it is the element-encoded mRNA, and not the element itself, that forms the transposition intermediate. In contrast, Class 2 or DNA elements are characterized by short terminal inverted repeats (TIRs) and, most importantly, transposition via a DNA intermediate. Plant DNA elements (such as Ac/Ds, Spm/dSpm and Mutator) usually excise from one site and re-insert elsewhere.
A unique Class 2 transposable element was discovered as a 128 base pair insertion in an exon of the maize waxy coding region in the wxB2 mutant allele. Database searches identified related elements (on average about 70% sequence identity) in the introns or the 5xe2x80x2 or 3xe2x80x2 flanking sequences of many maize coding regions. This new family was called Tourist. Almost one third of all sequenced maize coding regions contain a Tourist element as do the coding regions of other members of the grass tribe including rice, sorghum and barley. An insertion into a Tourist element led to the discovery of another element family called Stowaway, in the coding regions of both monocotyledonous and dicotyledonous plants. Finally, a systematic search of all available rice genomic sequences identified the new element families Gaijin, Castaway, Ditto, Wanderer, Explorer and revealed that short inverted-repeat elements were the predominant repeat sequence associated with rice coding regions (Bureau et al. (1996) Proc. Natl. Acad. Sci. USA, 93 8524-8529). These elements formed a unique collection of inverted-repeat transposons referred to as miniature inverted-repeat transposable elements (MITEs).
MITEs have been identified in all flowering plants that have significant genomic nucleotide sequences present in databases. For instance, MITE families have been found in maize, rice (including Gaijin, Castaway, Ditto, Wanderer, Explorer; Snap, Crackle, and Pop), bell pepper (Alien), and alfalfa (Bigfoot). The first MITE family from Arabidopsis, Emigrant, was recently described. Most characterized MITE families in plants appear to be relatively ancient components of genomes since family members were only distantly related to each other (70% sequence identity on average) and insertion sites were usually not polymorphic among members of the same species (Bureau et al. (1992) The Plant Cell 4, 1283-1294; Bureau et al. (1994) Proc. Natl. Acad. Sci. USA, 91, 1411-1415; Bureau et al. (1994) The Plant Cell, 6, 907-916).
MITEs are not restricted to flowering plants. MITE families have been described in insects (Aedes aegypti, the yellow fever mosquito), C. elegans, and even humans (trigger 1 and 2).
Despite the prevalence of MITEs in plant genomes little is known about their biology including, for instance, their distribution. This largely reflects the fact that most MITEs have been identified through database searches (see, for instance, Bureau et al. (1992) The Plant Cell, 4, 1283-1294; Bureau et al. (1994) Proc. Natl. Acad. Sci. USA, 91, 1411-1415; Bureau et al. (1994) The Plant Cell, 6, 907-916. For this reason, much of what is known about this important class of elements is restricted to MITE identification, categorization and descriptions of their presence in genic regions. It is not currently known, for example, whether their association with coding regions reflects a true target site preference or whether this is merely an artifact of identifying elements by searching the gene-rich databases. Recently, it was shown that in a 225 kilobase region of the maize genome, putative MITEs were found within genic regions, and not in nongenic regions (Tikhonov et al. (1999) Proc. Natl. Acad. Sci. USA, 96, 7409-7414) However, 225 kilobases represents less than 0.0001% of total maize DNA, thus it is unclear if these results can be extrapolated to the entire maize genome.
The investigation of genome structure has been accelerated by the use of in vitro methods that detect variation in the DNA sequence in the genomes between members of a species or closely related species. This variation at different locations in the genome is unique for each individual member of a species. These in vitro methods detect the variation, and produce what is referred to as a DNA fingerprint for an individual. Typically, the more closely related two individuals, the more similar the DNA fingerprint from each individual. DNA sequence differences detected by DNA fingerprinting, including single base pair changes as well as large deletions or additions, are referred to as polymorphisms. A polymorphism provides a marker for a specific location on a chromosome in the individual containing the polymorphism. A marker is typically detected as a DNA fragment.
Since the advent of these in vitro methods in the early 1980s, numerous methods for detecting polymorphisms that mark chromosomes have been developed. For instance, restriction fragment length polymorphism (RFLP), DNA amplification fingerprinting, cleaved amplified polymorphisms, randomly amplified polymorphic DNA, arbitrary primed-polymerase chain reaction, random amplified microsatellite polymorphism, simple sequence repeat, amplified fragment length polymorphism (AFLP) (Zabeau, EP Pat. No. 0 534 858 A1) and sequence-specific amplification polymorphisms (Waugh et al. (1997) Mol. Gen. Genet., 253, 687-694) have made their way into use in plant breeding and genetics. In general, the markers that are produced by each of these methods are randomly distributed throughout the genome and allow saturated genome coverage if enough markers are developed.
Typically, genomes contain nongenic regions, i.e., regions that do not contain coding regions. This is particularly true of plants where up to 99.5% of the genome can be nongenic. Nongenic regions are made up of mainly repetitive DNA, i.e., regions of DNA having nucleotides sequences that are present multiple times in the genome. Interspersed in nongenic are regions containing coding regions. These regions are referred to as genic regions and are made up of low or single copy regions of DNA. Typically, a large fraction of the markers generated by in vitro methods that detect variation in the DNA sequence are located in nongenic regions. Consequently, there is an increased cost in generating and mapping excessive numbers of markers.
The large plant genomes generally contain genes interspersed with much longer blocks of repetitive DNA. Given this organization, it would be highly desirable to have polymorphic markers that are located preferentially in genic regions. It would be even more desirable if these markers were present in high numbers in the genic regions. The present invention discloses that miniature inverted repeat transposable elements (MITEs) are polymorphic markers located preferentially in genic regions. The invention presents the first analysis of the distribution of MITEs that includes an entire genome, i.e., the analysis is not confined to genic regions or to a limited portion of a genome. This analysis indicates that MITEs are preferentially located in genic regions. This analysis of MITEs also unexpectedly showed that MITEs are polymorphic. The polymorphic nature of MITEs was surprising because other transposable elements associated with genic regions, for instance Alu elements in humans, are usually in the same position in all individuals of a species. The polymorphism of MITEs, coupled with their genic preference, indicates that they are a major factor in generating allelic diversity.
An advantage of the methods of the present invention over other in vitro methods that detect DNA variation is that since markers generated using MITEs are preferentially located in genic regions, less markers must be generated and mapped. Thus, costs are decreased. Also, the high copy number of MITEs is expected to allow genic regions to be saturated with markers using the methods of the present invention.
The present invention provides a method of characterizing the DNA of an individual, for instance by producing a DNA fingerprint of an individual. The method includes digesting the DNA of the individual with a restriction endonuclease, ligating a double stranded adaptor to at least one end of the restriction fragments, and amplifying at least a portion of the restriction fragments with a primer pair. Typically, the DNA is genomic DNA. The nucleotide sequence of one primer of the primer pair is complementary to a portion of a miniature inverted repeat transposable element that is a member of a miniature inverted repeat transposable element family, and the other primer of the primer pair includes at the 5xe2x80x2 end a nucleotide sequence complementary to at least a portion of the adaptor. The amplified fragments are resolved, typically by electrophoresis, to produce a DNA fingerprint. Optionally, at least about 70% of the miniature inverted repeat transposable elements of the miniature inverted repeat family are present in genic regions
Optionally, the individual can be a plant, including maize and teosinte. When the individual is maize, the miniature inverted repeat transposable element can have at least about 90% identity to SEQ ID NO:26, or the complement thereof, at least about 90% identity to SEQ ID NOs:28, or the complement thereof, at least about 90% identity to SEQ ID NO:29, or the complements thereof, or at least about 90% identity to SEQ ID NO:27, or the complement thereof.
Optionally, one of the primers includes a detectable label, for instance a radioactive label, a fluorescent label, a chemiluminescent label, or a combination thereof.
Another aspect of the invention provides a method of detecting at least one polymorphism between the nucleic acid fragments of a first individual and a second individual. The method includes producing a DNA fingerprint of each individual, and comparing the amplified fragments of each individual to detect at least one difference between the amplified fragments of the first individual and the amplified fragments of the second individual. The difference between the amplified fragments of the first individual and the amplified fragments of the second individual indicates the presence of a polymorphism. Optionally, the producing and comparing steps can be repeated with additional individuals, and/or the first individual and second individual can be members of a recombinant inbred line mapping population.
The present invention also provides a method of correlating the presence of an amplified fragment to a phenotype. The method includes producing a DNA fingerprint of a first individual that displays a phenotype, and of a second individual that does not display the phenotype. The amplified fragments of each individual are compared to detect at least one difference between the amplified fragments of the first individual and the amplified fragments of the second individual. The presence of the amplified fragments is then related to the display of the phenotype. Optionally, the producing and comparing steps can be repeated with additional individuals, and/or the first individual and second individual can be members of a recombinant inbred line mapping population.
In another aspect, the invention provides a method for generating a set of molecular markers. The method includes producing a DNA fingerprint of a first individual and a DNA fingerprint of a second individual, where the first and the second individuals are members of different recombinant inbred lines that are members of the same mapping population. The amplified fragments of the first individual and the second individual are compared to detect at least one polymorphism, and the producing and comparing steps are repeated with additional individuals, where the additional individuals are members of different recombinant inbred lines that are members of the same mapping population as the first and second individuals. The linkage between the at least one polymorphism and a set of known markers, for instance, RFLP markers or AFLP markers, is then determined.
xe2x80x9cNucleic acid fragmentxe2x80x9d as used herein refers to a linear polymeric form of nucleotides of any length, either ribonucleotides or deoxynucleotides, and includes both double- and single-stranded DNA and RNA. A nucleic acid fragment may include both coding and non-coding regions that can be obtained directly from a natural source (e.g., a plant), or can be prepared with the aid of recombinant or synthetic techniques. An example of a nucleic acid fragment present in an individual is a chromosome. A nucleic acid molecule may be equivalent to this nucleic acid fragment or a nucleic acid molecule can include this fragment in addition to one or more other nucleotides. For example, a nucleic acid molecule of the invention can be a vector, such as an expression or cloning vector. A coding region is a linear form of nucleotides that typically encodes a polypeptide, usually via mRNA.
A xe2x80x9crestriction fragmentxe2x80x9d as used herein is a type of nucleic acid fragment. A restriction fragment results from exposing at least one nucleic acid fragment, for instance the genomic DNA of an individual, to a restriction endonulease under conditions such that the restriction endonuclease cleaves the DNA.
An xe2x80x9camplified fragmentxe2x80x9d as used herein is a type of nucleic acid fragment. An amplified fragment is the result of exposing a nucleic acid fragment to at least two primers under conditions such that the primers hybridize to the nucleic acid fragment and increase the number of the portion of the nucleic acid fragment. The portion of the nucleic acid fragment that is increased is the nucleotides to which the primers hybridize and the region of the nucleic acid fragment located between the nucleotides to which the primers hybridize.
A xe2x80x9cDNA fingerprintxe2x80x9d as used herein refers to the pattern of nucleic acid fragments that results when an individual""s DNA is subjected to an in vitro method that detects variation in the DNA sequence, for instance the methods of the present invention. Typically, the more closely related two individuals, the more similar the DNA fingerprint from each individual. Variations between the DNA fingerprints of two individuals, i.e., the presence or absence of a nucleic acid fragment in one individual compared to another, is referred to as a xe2x80x9cpolymorphismxe2x80x9d or a xe2x80x9cpolymorphic marker.xe2x80x9d A polymorphism results from DNA sequence differences, including single base pair changes as well as large deletions or additions, between the two individuals. Such a change in a DNA sequence is typically inherited as expected by Mendel""s laws of inheritance. Thus, polymorphisms can be used as genetic markers to map the nucleotides responsible for a phenotype.
As used herein, an xe2x80x9cindividualxe2x80x9d refers to a single entity, for instance, a single plant, or a single animal.
xe2x80x9cPhenotypexe2x80x9d is a visible or otherwise measurable property of an individual.
xe2x80x9cGenomic DNAxe2x80x9d refers to the DNA present in a cell of an individual. The DNA includes chromosomal and extrachromosomal, for instance, plastid, DNA.
xe2x80x9cComplementxe2x80x9d and xe2x80x9ccomplementaryxe2x80x9d refer to the ability of two single stranded nucleic acid fragments to base pair with each other, where an adenine on one nucleic acid fragment will base pair to a thymine on a second nucleic acid fragment and a cytosine on one nucleic acid fragment will base pair to a guanine on a second nucleic acid fragment. Two nucleic acid fragments are complementary to each other when a nucleotide sequence in one nucleic acid fragment can base pair with a nucleotide sequence in a second nucleic acid fragment. For instance, 5xe2x80x2-ATGC and 5xe2x80x2-GCAT are complementary. The term complement and complementary also encompasses two nucleic acid fragments where one nucleic acid fragment contains at least one nucleotide that will not base pair to at least one nucleotide present on a second nucleic acid fragment. For instance the third nucleotide of each of the two nucleic acid fragments 5xe2x80x2-ATTGC and 5xe2x80x2-GCTAT will not base pair, but these two nucleic acid fragments are complementary as defined herein. Typically two nucleic acid fragments are complementary if they hybridize under the conditions referred to herein.
xe2x80x9cCorrelatingxe2x80x9d as used herein refers to determining linkage. Linkage can be determined using genetic methods well known in the art including, for instance, recombination analysis. Linkage can also be determined using physical methods including, for instance, chromosome walking.
xe2x80x9cMapping populationxe2x80x9d as used herein refers to parents and progeny used to establish genetic linkage. Mapping populations, including mapping populations for mice and plants, are well known to the art.
Unless otherwise specified, the indefinite article xe2x80x9caxe2x80x9d or xe2x80x9canxe2x80x9d means one or more.