Exploration of genomic DNA has long been desired by the scientific, in particular medical, community. Genomic DNA holds the key to identification, diagnosis and treatment of diseases such as cancer and Alzheimer's disease. In addition to disease identification and treatment, exploration of genomic DNA may provide significant advantages in plant and animal breeding efforts, which may provide answers to food and nutrition problems in the world.
Many diseases are known to be associated with specific genetic components, in particular with polymorphisms in specific genes. The identification of polymorphisms in large samples such as genomes is at present a laborious and time-consuming task. However, such identification is of great value to areas such as biomedical research, developing pharmacy products, tissue typing, genotyping and population studies.
Markers, i.c. genetic markers, have been used for a very long time as a genetic typing method, i.e. to connect a phenotypic trait to the presence, absence or amount of a particular part of DNA (gene). One of the most versatile genetic typing technologies is AFLP, already around for many years and widely applicable to any organism (for reviews see Savelkoul et al. J. Clin. Microbiol, 1999, 37(10), 3083-3091; Bensch et al. Molecular Ecology, 2005, 14, 2899-2914)
The AFLP technology (Zabeau & Vos, 1993; Vos et al., 1995) has found widespread use in plant breeding and other field since its invention in the early nineties. This is due to several characteristics of AFLP, of which the most important is that no prior sequence information is needed to generate large numbers of genetic markers in a reproducible fashion. In addition, the principle of selective amplification, a cornerstone of AFLP, ensures that the number of amplified fragments can be brought in line with the resolution of the detection system, irrespective of genome size or origin.
Detection of AFLP fragments is commonly carried out by electrophoresis on slab-gels (Vos et al., 1995) or capillary electrophoresis (van der Meulen et al., 2002). The majority of AFLP markers scored in this way represent (single nucleotide) polymorphisms occurring either in the restriction enzyme recognition sites used for AFLP template preparation or their flanking nucleotides covered by selective AFLP primers. The remainder of the AFLP markers are insertion/deletion polymorphisms occurring in the internal sequences of the restriction fragments and a very small fraction on single nucleotide substitutions occurring in small restriction fragments (<approximately 100 bp), which for these fragments cause reproducible mobility variations between both alleles which can be observed upon electrophoresis; these AFLP markers can be scored co-dominantly without having to rely on band intensities.
In a typical AFLP fingerprint, the AFLP markers therefore constitute the minority of amplified fragments (less than 50 percent but often less than 20 percent), while the remainder are commonly referred to as constant AFLP fragments. The latter are nevertheless useful in the gel scoring procedure as they serve as anchor points to calculate fragments mobilities of AFLP markers and aid in quantifying the markers for co-dominant scoring. Co-dominant scoring (scoring for homo- or heterozygosity) of AFLP markers currently is restricted to the context of fingerprinting a segregating population. In a panel of unrelated lines, only dominant scoring is possible.
Although the throughput of AFLP is very high due to high multiplexing levels in the amplification and detection steps, the rate limiting step is the resolving power of electrophoresis. Electrophoresis allows unique identification of the majority of amplified fragments based on the combination of restriction enzyme combinations (EC), primer combinations (PC) and mobility, but electrophoresis is only capable to distinguish the amplified fragments based on differences in mobility. Fragments of similar mobility are often found as so-called ‘stacked bands’ and with electrophoresis, no attention can be given to the information that is contained in so-called ‘constant bands’, i.e. amplified restriction fragments that do not appear to differ between compared species. Furthermore on a typical gel-based system, or on a capillary system such as a MegaBACE, samples must be run in parallel and only about 100-150 bands per lane on a gel or per capillary can be analysed. These limitations also hamper throughput.
Ideally, the detection system should be capable of determining the entire sequence of the amplified fragments to capture all amplified restriction fragments. However, most high throughput sequencing technologies cannot yet provide sequencing reads that encompass entire AFLP fragments, which are typically 100-500 bp in length.
So far, detection of AFLP markers/sequences by sequencing has not been economically feasible due to, among other limitations, cost limitations of Sanger dideoxy sequencing technology and other conventional sequencing technologies.
Detection by sequencing instead of mobility determination will increase throughput because:
1) polymorphisms located in the internal sequences will be detected in most (or all) amplified fragments; this will increase the number of markers per PC considerably.
2) no loss of AFLP markers due to co-migration of AFLP markers and constant bands.
3) co-dominant scoring does not rely on quantification of band intensities and is independent of the relatedness of the individuals fingerprinted.
However, detection by sequencing of the entire restriction fragment is still relatively uneconomical. Furthermore, the current state of the art sequencing technology such as disclosed herein elsewhere (from 454 Life Sciences, www.454.com and Solexa, www.solexa.com), despite their overwhelming sequencing power, can only provide sequencing fragments of limited length. Also the current methods do not allow for the simultaneous processing of many samples in one run.
Definitions
In the following description and examples a number of terms are used. In order to provide a clear and consistent understanding of the specification and claims, including the scope to be given such terms, the following definitions are provided. Unless otherwise defined herein, all technical and scientific terms used have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The disclosures of all publications, patent applications, patents and other references are incorporated herein in their entirety by reference.
Nucleic acid: a nucleic acid according to the present invention may include any polymer or oligomer of pyrimidine and purine bases, preferably cytosine, thymine, and uracil, and adenine and guanine, respectively (See Albert L. Lehninger, Principles of Biochemistry, at 793-800 (Worth Pub. 1982) which is herein incorporated by reference in its entirety for all purposes). The present invention contemplates any deoxyribonucleotide, ribonucleotide or peptide nucleic acid component, and any chemical variants thereof, such as methylated, hydroxymethylated or glycosylated forms of these bases, and the like. The polymers or oligomers may be heterogenous or homogenous in composition, and may be isolated from naturally occurring sources or may be artificially or synthetically produced. In addition, the nucleic acids may be DNA or RNA, or a mixture thereof, and may exist permanently or transitionally in single-stranded or double-stranded form, including homoduplex, heteroduplex, and hybrid states.
AFLP: AFLP refers to a method for selective amplification of nucleic acids based on digesting a nucleic acid with one or more restriction endonucleases to yield restriction fragments, ligating adaptors to the restriction fragments and amplifying the adaptor-ligated restriction fragments with at least one primer that is (part) complementary to the adaptor, (part) complementary to the remains of the restriction endonuclease, and that further contains at least one randomly selected nucleotide from amongst A, C, T, or G (or U as the case may be). AFLP does not require any prior sequence information and can be performed on any starting DNA. In general, AFLP comprises the steps of:                (a) digesting a nucleic acid, in particular a DNA or cDNA, with one or more specific restriction endonucleases, to fragment the DNA into a corresponding series of restriction fragments;        (b) ligating the restriction fragments thus obtained with a double-stranded synthetic oligonucleotide adaptor, one end of which is compatible with one or both of the ends of the restriction fragments, to thereby produce adaptor-ligated, preferably tagged, restriction fragments of the starting DNA;        (c) contacting the adaptor-ligated, preferably tagged, restriction fragments under hybridizing conditions with one or more oligonucleotide primers that contain selective nucleotides at their 3′-end;        (d) amplifying the adaptor-ligated, preferably tagged, restriction fragment hybridised with the primers by PCR or a similar technique so as to cause further elongation of the hybridised primers along the restriction fragments of the starting DNA to which the primers hybridised; and        (e) detecting, identifying or recovering the amplified or elongated DNA fragment thus obtained.        
AFLP thus provides a reproducible subset of adaptor-ligated fragments. AFLP is described in EP 534858, U.S. Pat. No. 6,045,994 and in Vos et al. Reference is made to these publications for further details regarding AFLP. The AFLP is commonly used as a complexity reduction technique and a DNA fingerprinting technology. Within the context of the use of AFLP as a fingerprinting technology, the concept of an AFLP marker has been developed.
AFLP marker: An AFLP marker is an amplified adaptor-ligated restriction fragment that is different between two samples that have been amplified using AFLP (fingerprinted), using the same set of primers. As such, the presence or absence of this amplified adaptor-ligated restriction fragment can be used as a marker that is linked to a trait or phenotype. In conventional gel technology, an AFLP marker shows up as a band in the gel located at a certain mobility. Other electrophoretic techniques such as capillary electrophoresis may not refer to this as a band, but the concept remains the same, i.e. a nucleic acid with a certain length and mobility. Absence or presence of the band may be indicative of (or associated with) the presence or absence of the phenotype. AFLP markers typically involve SNPs in the restriction site of the endonuclease or the selective nucleotides. Occasionally, AFLP markers may involve indels in the restriction fragment.
Constant band: a constant band in the AFLP technology is an amplified adaptor-ligated restriction fragment that is relatively invariable between samples. Thus, a constant band in the AFLP technology will, over a range of samples, show up at about the same position in the gel, i.e. has the same length/mobility. In conventional AFLP these are typically used to anchor the lanes corresponding to samples on a gel or electropherograms of multiple AFLP samples detected by capillary electrophoresis. Typically, a constant band is less informative than an AFLP marker. Nevertheless, as AFLP markers customary involve SNPs in the selective nucleotides or the restriction site, constant bands may comprise SNPs in the restriction fragments themselves, rendering the constant bands an interesting alternative source of genetic information that is complementary to AFLP markers.
Selective base: Located at the 3′ end of the primer that contains a part that is complementary to the adaptor and a part that is complementary to the remains of the restriction site, the selective base is randomly selected from amongst A, C, T or G. By extending a primer with a selective base, the subsequent amplification will yield only a reproducible subset of the adaptor-ligated restriction fragments, i.e. only the fragments that can be amplified using the primer carrying the selective base. Selective nucleotides can be added to the 3′end of the primer in a number varying between 1 and 10. Typically 1-4 suffice. Both primers may contain a varying number of selective bases. With each added selective base, the subset reduces the amount of amplified adaptor-ligated restriction fragments in the subset by a factor of about 4. Typically, the number of selective bases used in AFLP is indicated by +N+M, wherein one primer carries N selective nucleotides and the other primers carries M selective nucleotides. Thus, an Eco/Mse+1/+2 AFLP is shorthand for the digestion of the starting DNA with EcoRI and MseI, ligation of appropriate adaptors and amplification with one primer directed to the EcoRI restricted position carrying one selective base and the other primer directed to the MseI restricted site carrying 2 selective nucleotides. A primer used in AFLP that carries at least one selective nucleotide at its 3′ end is also depicted as an AFLP-primer. Primers that do not carry a selective nucleotide at their 3′ end and which in fact are complementary to the adaptor and the remains of the restriction site are sometimes indicated as AFLP+0 primers.
Clustering: with the term “clustering” is meant the comparison of two or more nucleotide sequences based on the presence of short or long stretches of identical or similar nucleotides. Several methods for alignment of nucleotide sequences are known in the art, as will be further explained below. Sometimes the terms “assembly” or “alignment” are used as synonyms.
Identifier: a short sequence that can be added to an adaptor or a primer or included in its sequence or otherwise used as label to provide a unique identifier. Such a sequence identifier can be a unique base sequence of varying but defined length uniquely used for identifying a specific nucleic acid sample. For instance 4 bp tags allow 4(exp4)=256 different tags. Typical examples are ZIP sequences, known in the art as commonly used tags for unique detection by hybridization (Iannone et al. Cytometry 39:131-140, 2000). Using such an identifier, the origin of a PCR sample can be determined upon further processing. In the case of combining processed products originating from different nucleic acid samples, the different nucleic acid samples are generally identified using different identifiers.
Sequencing: The term sequencing refers to determining the order of nucleotides (base sequences) in a nucleic acid sample, e.g. DNA or RNA.
High-throughput screening: High-throughput screening, often abbreviated as HTS, is a method for scientific experimentation especially relevant to the fields of biology and chemistry. Through a combination of modern robotics and other specialised laboratory hardware, it allows a researcher to effectively screen large amounts of samples simultaneously.
Restriction endonuclease: a restriction endonuclease or restriction enzyme is an enzyme that recognizes a specific nucleotide sequence (target site) in a double-stranded DNA molecule, and will cleave both strands of the DNA molecule at or near every target site.
Restriction fragments: the DNA molecules produced by digestion with a restriction endonuclease are referred to as restriction fragments. Any given genome (or nucleic acid, regardless of its origin) will be digested by a particular restriction endonuclease into a discrete set of restriction fragments. The DNA fragments that result from restriction endonuclease cleavage can be further used in a variety of techniques and can for instance be detected by gel electrophoresis.
Gel electrophoresis: in order to detect restriction fragments, an analytical method for fractionating DNA molecules on the basis of size can be required. The most commonly used technique for achieving such fractionation is (capillary) gel electrophoresis. The rate at which DNA fragments move in such gels depends on their molecular weight; thus, the distances traveled decrease as the fragment lengths increase. The DNA fragments fractionated by gel electrophoresis can be visualized directly by a staining procedure e.g. silver staining or staining using ethidium bromide, if the number of fragments included in the pattern is sufficiently small. Alternatively further treatment of the DNA fragments may incorporate detectable labels in the fragments, such as fluorophores or radioactive labels, which are preferably used to label one strand of the AFLP product.
Ligation: the enzymatic reaction catalyzed by a ligase enzyme in which two double-stranded DNA molecules are covalently joined together is referred to as ligation. In general, both DNA strands are covalently joined together, but it is also possible to prevent the ligation of one of the two strands through chemical or enzymatic modification of one of the ends of the strands. In that case the covalent joining will occur in only one of the two DNA strands.
Synthetic oligonucleotide: single-stranded DNA molecules having preferably from about 10 to about 50 bases, which can be synthesized chemically are referred to as synthetic oligonucleotides. In general, these synthetic DNA molecules are designed to have a unique or desired nucleotide sequence, although it is possible to synthesize families of molecules having related sequences and which have different nucleotide compositions at specific positions within the nucleotide sequence. The term synthetic oligonucleotide will be used to refer to DNA molecules having a designed or desired nucleotide sequence.
Adaptors: short double-stranded DNA molecules with a limited number of base pairs, e.g. about 10 to about 30 base pairs in length, which are designed such that they can be ligated to the ends of restriction fragments. Adaptors are generally composed of two synthetic oligonucleotides which have nucleotide sequences which are partially complementary to each other. When mixing the two synthetic oligonucleotides in solution under appropriate conditions, they will anneal to each other forming a double-stranded structure. After annealing, one end of the adaptor molecule is designed such that it is compatible with the end of a restriction fragment and can be ligated thereto; the other end of the adaptor can be designed so that it cannot be ligated, but this need not be the case (double ligated adaptors).
Adaptor-ligated restriction fragments: restriction fragments that have been capped by adaptors.
Primers: in general, the term primers refer to DNA strands which can prime the synthesis of DNA. DNA polymerase cannot synthesize DNA de novo without primers: it can only extend an existing DNA strand in a reaction in which the complementary strand is used as a template to direct the order of nucleotides to be assembled. We will refer to the synthetic oligonucleotide molecules which are used in a polymerase chain reaction (PCR) as primers.
DNA amplification: the term DNA amplification will be typically used to denote the in vitro synthesis of double-stranded DNA molecules using PCR. It is noted that other amplification methods exist and they may be used in the present invention without departing from the gist.