There is a rapidly growing interest in the detection of specific nucleic acid sequences. This interest has not only arisen from the recently disclosed draft nucleotide sequence of the human genome and the presence therein, as well as in the genomes of many other organisms, of an abundant amount of single nucleotide polymorphisms (SNP), but also from marker technologies such as AFLP. The recognition that the presence of single nucleotide substitutions (and other types of genetic polymorphisms such as small insertion/deletions; indels) in genes provide a wide variety of information has also attributed to this increased interest. It is now generally recognised that these single nucleotide substitutions are one of the main causes of a significant number of monogenically and multigenically inherited diseases, for instance in humans, or are otherwise involved in the development of complex phenotypes such as performance traits in plants and livestock species. Thus, single nucleotide substitutions are in many cases also related to or at least indicative of important traits in humans, plants and animal species.
Analysis of these single nucleotide substitutions and indels will result in a wealth of valuable information, which will have widespread implications on medicine and agriculture in the widest possible terms. It is for instance generally envisaged that these developments will result in patient-specific medication. To analyse these genetic polymorphisms, there is a growing need for adequate, reliable and fast methods that enable the handling of large numbers of samples and large numbers of (predominantly) SNPs in a high throughput fashion, without significantly compromising the quality of the data obtained.
Even though a wide diversity of high-throughput detection platforms for SNPs exist at present (such as fluorometers, DNA microarrays, mass-spectrometers and capillary electrophoresis instruments), the major limitation to achieve cost-effective high throughput detection is that a robust and efficient multiplex amplification technique for non-random selection of SNPs is currently lacking to utilise these platforms efficiently, which results in suboptimal use of these powerful detection platforms and/or high costs per datapoint. “Throughput” as used herein, defines a relative parameter indicating the number of samples and target sequences that can be analysed per unit of time.
Specifically, using common amplification techniques such as the PCR technique it is possible to amplify a limited number of target sequences by combining the corresponding primer pairs in a single amplification reaction but the number of target sequences that can be amplified simultaneously is small and extensive optimisation may be required to achieved similar amplification efficiencies of the individual target sequences. One of the solutions to multiplex amplification is to use a single primer pair for the amplification of all target sequences, which requires that all targets must contain the corresponding primer-binding sites. This principle is incorporated in the AFLP technique (EP-A 0 534 858). Using AFLP, the primer-binding sites result from a digestion of the target nucleic acid (i.e. total genomic DNA or cDNA) with one or more restriction enzymes, followed by adapter ligation. AFLP essentially targets a random selection of sequences contained in the target nucleic acid. It has been shown that, using AFLP, a practically unlimited number of target sequences can be amplified in a single reaction, depending on the number of target sequences that contain primer-binding region(s) that are perfectly complementary to the amplification primers. Exploiting the use of single primer-pair for amplification in combination with a non-random method for SNP target selection and efficient use of a high throughput detection platform may therefore substantially increase the efficiency of SNP genotyping, however such technology has not been provided in the art yet.
One of the principal methods used for the analysis of the nucleic acids of a known sequence is based on annealing two probes to a target sequence and, when the probes are hybridised adjacently to the target sequence, ligating the probes. The OLA-principle (Oligonucleotide Ligation Assay) has been described, amongst others, in U.S. Pat. No. 4,988,617 (Landegren et al.). This publication discloses a method for determining the nucleic acid sequence in a region of a known nucleic acid sequence having a known possible mutation. To detect the mutation, oligonucleotides are selected to anneal to immediately adjacent segments of the sequence to be determined. One of the selected oligonucleotide probes has an end region wherein one of the end region nucleotides is complementary to either the normal or to the mutated nucleotide at the corresponding position in the known nucleic acid sequence. A ligase is provided which covalently connects the two probes when they are correctly base paired and are located immediately adjacent to each other. The presence or absence of the linked probes is an indication of the presence of the known sequence and/or mutation.
Abbot et al. in WO 96/15271 developed a method for a multiplex ligation amplification procedure comprising the hybridisation and ligation of adjacent probes. These probes are provided with an additional length segment, the sequence of which, according to Abbot et al, is unimportant. The deliberate introduction of length differences intends to facilitate the discrimination on the basis of fragment length in gel-based techniques.
WO 97/45559 (Barany et al.) describes a method for the detection of nucleic acid sequence differences by using combinations of ligase detection reactions (LDR) and polymerase chain reactions (PCR). Disclosed are methods comprising annealing allele-specific probe sets to a target sequence and subsequent ligation with a thermostable ligase, optionally followed by removal of the unligated primers with an exonuclease. Amplification of the ligated products with fluorescently labelled primers results in a fluorescently labelled amplified product. Detection of the products is based on separation by size or electrophoretic mobility or on an addressable array.
Detection of the amplified probes is performed on a universally addressable array containing capturing oligonucleotides. These capturing oligonucleotides contain a region that is capable of annealing to a pre-determined region in the amplified probe, a so-called zip-region or zip code. Each amplified probe contains a different zip code and each zip code will hybridise to its corresponding capturing oligonucleotide on the array. Detection of the label in combination with the position on the array provides information on the presence of the target sequence in the sample. This method allows for the detection of a number of nucleic acid sequences in a sample. However, the design, validation and routine use of arrays for the detection of amplified probes involves many steps (ligation, amplification, optionally purification of the amplified material, array production, hybridisation, washing, scanning and data quantification), of which some (particularly hybridisation and washing) are difficult to automate. Array-based detection is therefore laborious and costly to analyse a large number of samples for a large number of SNPs.
The LDR oligonucleotide probes in a given set may generate a unique length product and thus may be distinguished from other products based on size. For the amplification a primer set is provided wherein one of the primers contains a label. Different primers can be provided with different labels to allow for the distinction of products.
The method and the various embodiments described by Barany et al. are found to have certain disadvantages. One of the major disadvantages is that the method in principle does not provide for a true high throughput process for the determination of large numbers of target sequences in short periods of time using reliable and robust methods without compromising the quality of the data produced and the efficiency of the process.
More in particular, one of the disadvantages of the means and methods as disclosed by Barany et al. resides in the limited multiplex capacity when discrimination is based inter alia, on the length of the allele specific probe sets. Discrimination between sequences that are distinguishable by only a relatively small length difference is, in general, not straightforward and carefully optimised conditions may be required in order to come to the desired resolving power. Discrimination between sequences that have a larger length differentiation is in general easier to accomplish. This may provide for an increase in the number of sequences that can be analysed in the same sample. However, providing for the necessary longer nucleotide probes is a further hurdle to be taken. In the art, synthetic nucleotide sequences are produced by conventional chemical step-by-step oligonucleotide synthesis with a yield of about 98.5% per added nucleotide. When longer probes are synthesised (longer than ca. 60 nucleotides) the yield generally drops and the reliability and purity of the synthetically produced sequence can become a problem.
These and other disadvantages of the methods disclosed in WO 97/45559 and other publications based on oligonucleotide ligation assays herein lead the present inventors to the conclusion that the methods described therein are less preferable for adaptation in a high throughput protocol that is capable of handling a large number of samples each comprising large numbers of sequences.
The specific problem of providing for longer probes has been solved by Schouten et al. (WO 01/61033). WO 01/61033 discloses the preparation of longer probes for use in ligation-amplification assays. They provided probes that are considerably longer than those that can be obtained by conventional chemical synthesis methods to avoid the problem associated with the length-based discrimination of amplified products using slab-gels or capillary electrophoresis, namely that only a small part of the detection window/resolving capacity of up to 1 kilo base length is used when OLA probes are synthesised by chemical means. With an upper limit in practice of around 100-150 bases for chemically synthesised oligonucleotides according to the current state of technology, this results in amplification products that are less than 300 base pairs long at most, but often much less (see Barany et al.). The difficulty of generating such long probes (more than about 150 nucleotides) with sufficient purity and yield by chemical means has been countered by Schouten et al., using a method in which the probes have been obtained by an in vivo enzymatic template directed polymerisation, for instance by the action of a DNA polymerase in a suitable cell, such as an M13 phage.
However, the production and purification of such biological probes requires a collection of suitable host strains containing M13 phage conferring the desired length variations and the use of multiple short chemically synthesised oligonucleotides in the process, thus their use is very laborious and time-consuming, hence costly and not suitable for high-throughput assay development. Furthermore, the use of relatively long probes and relatively large length differences between the amplifiable target sequences may result in differential amplification efficiencies in favour of the shorter target sequences. This adversely affects the overall data quality, hampering the development of a true high throughput method. Thus the need for a reliable and cost-efficient solution to multiplex amplification and subsequent length-based detection for high throughput application remains.
Other solutions that have been suggested in the art such as the use of circular (padlock) probes in combination with isothermal amplification such as rolling circle amplification (RCA) are regarded as profitable because of the improved hybridisation characteristics of circular probes and the isothermal character of RCA.
Rolling circle amplification is an amplification method wherein a first primer is hybridised to a ligated or connected circular probe. Subsequent primer elongation, using a polymerase with strand displacement activity results in the formation of a long polynucleotide strand which contains multiple representations of the connected circular probe. Such a long strand of concatamers of the connected probe is subsequently detected by the use of hybridisation probes. These probes can be labelled. Exponential amplification of the ligated probe can be achieved by the hybridisation of a second primer that hybridises to the concatameric strand and is subsequently elongated. (Exponential) Rolling Circle Amplification ((E)RCA) is described inter alia in U.S. Pat. No. 5,854,033, U.S. Pat. No. 6,143,495 WO97/19193, Lizardi et al, Nature genetics 19(3):225-232 (1998).
U.S. Pat. No. 5,876,924, WO98/04745 and WO98/04746 by Zhang et al. describe a ligation reaction using two adjacent probes wherein one of the probes is a capture probe with a binding element such as biotin. After ligation, the unligated probes are removed and the ligated captured probe is detected using paramagnetic beads with a ligand (biotin) binding moiety. Zhang also discloses the amplification of circular probes using PCR primers in a rolling circle amplification, using a DNA polymerase with strand displacement activity, thereby generating a long concatamer of the circular probe, starting from extension of the first primer. A second PCR primer subsequently hybridises to the long concatamer and elongation thereof provides a second generation of concatamers and facilitates exponential amplification. Detection is generally based on the hybridisation of labelled probes.
However, these methods have proven to be less desirable in high throughput fashion. One of the reasons is that, for a high throughput method based on length discrimination, the use of (E)RCA results in the formation of long concatamers. These concatamers are problematic, as they are not suitable for high throughput detection.
U.S. Pat. No. 6,221,603 disclosed a circular probe that contains a restriction site. The probe is amplified using (E)RCA and the resulting concatamers are restricted at the restriction site. The restriction fragments are then separated by length and detected. Separation and detection is performed on a capillary electrophoretic platform, such as the MegaBACE equipment available from Molecular Dynamics Amersham-Pharmacia For detection labelled dNTP's may be incorporated into the fragments during amplification, or the fragments may be detected by staining or by labelled detection probes. Partial digestion by the restriction enzyme may however affect the reliability of the method. Furthermore, the methods for labelling of the fragments as disclosed in U.S. Pat. No. 6,221,603, do not allow to fully utilise the MegaBACE's capacity of simultaneous detection of multiple colours.
The present inventors have set out to eliminate or at least diminish the existing problems in the art while at the same time attempting to maintain the advantageous aspects thereof, and to further improve the technology. Other problems in the art and solutions provided thereto by the present invention will become clear throughout the description, the figures and the various embodiments and examples.