Nucleic acid sequence analysis has become a corner-stone in many activities in biology, biotechnology and medicine. The ability to determine nucleic acid sequences has become increasingly important as efforts have commenced to determine the sequences of the large genomes of humans and other higher organisms and also, for example, in single nucleotide polymorphism detection and screening and gene expression monitoring. The genetic information provided by nucleic acid sequencing has many applications in areas such as for example drug target discovery and validation, disease diagnosis and risk scoring and organism identification and characterization.
The first step in such applications is the determination of the actual chemical composition of the nucleic acids of interest, more precisely the determination of the sequence of occurrence of the four bases adenine (A), cytosine (C), guanine (G) and thymine (T) or uracil (U) which comprise nucleic acids. However, such applications require the sequencing of nucleic acids on a large scale, making high throughput methods of nucleic acid sequencing extremely desirable.
Methods of nucleic acid sequencing are documented in the art. The two most commonly used are the chemical cleavage technique by Maxam and Gilbert which relies on base-specific chemistry and the now more popular Sanger sequencing technique which relies on an enzymatic chain terminating principle and is now used on a routine basis for nucleic acid sequencing.
In Sanger sequencing, each nucleic acid to be sequenced is replicated in a reaction involving DNA polymerase, deoxynucleotide triphosphates (dNTPs) and dideoxynucleotide triphosphates (ddNTPs). The DNA polymerase can incorporate both dNTPs and ddNTPs into the growing DNA strand. However, once a ddNTP is incorporated, the 3′ end of the growing DNA strand lacks a hydroxyl group and is no longer a substrate for chain elongation, thus terminating the nucleic acid chain. Hence, in a particular reaction including one type of ddNTP a mixture of nucleic acids of different lengths is produced, all terminating with the same ddNTP. Usually separate reactions are set up for each of the four types of ddNTP and the distribution of lengths of the nucleic acid fragments produced is analysed by denaturing gel electrophoresis (which resolves nucleic acid fragments according to their size), or more recently, by mass-spectroscopy. Usually, one or more of the deoxynucleotide triphosphates in the reaction mixture is labelled to enable detection of the fragments of different lengths.
The above described methods are disadvantageous because each nucleic acid to be sequenced has to be processed individually during the biochemical reaction. Gel electrophoresis is cumbersome, labour intensive and intrinsically slow even when capillary electrophoresis is used and is not well suited for large scale high throughput sequencing. In addition, the subsequent determination of the sequence is cumbersome. Mass-spectroscopy is still at the prototype level, requires very expensive apparatus and each sample has to be analysed individually.
One way to increase throughput is to process many samples in parallel. Methods using DNA hybridization of nucleic acid probes are in use and allow for some multiplexing of the process during the biochemical and the electrophoretic processes, but at the cost of lengthy additional manipulations.
More recently methods based on DNA chips and DNA hybridization are becoming available (Thomas and Burke Exp. Opin. Ther. Patents 8: 503–508 (1998)). These methods are disadvantageous because for each application, a DNA chip has to be designed and manufactured first: this is a lengthy operation and the price of an individual chip drops only when very large numbers of the chip are required. Also, the chips are not reusable and for each chip only one sample of nucleic acids, e.g. one patient to be diagnosed, can be processed at each time. Finally, the extent of sequence which can be analysed by such a chip is limited to less than 100,000 bases, and is limited to some applications such as DNA genotyping and gene expression profiling.
In most known techniques for nucleic acid sequence analysis, amplification of the nucleic acids of interest is a prerequisite step in order to obtain the nucleic acid in a quantity sufficient for analysis.
Several methods of nucleic acid amplification are well known and documented in the art. For example, nucleic acids can be amplified by inserting the nucleic acid of interest into an expression vector construct. Such vectors can then be introduced into suitable biological host cells and the vector DNA, including the nucleic acid of interest, amplified by culturing the biological host using well established protocols.
Nucleic acids amplified by such methods can be isolated from the host cells by methods well known and documented in the art. However, such methods have the disadvantage of being generally time consuming, labour intensive and difficult to automate.
The technique of DNA amplification by the polymerase chain reaction (PCR) was disclosed in 1985 (Saiki et al. Science 230, 1350–1354) and is now a method well known and documented in the art. A target nucleic acid fragment of interest can be amplified using two short oligonucleotide sequences (usually referred to as primers) which are specific to known sequences flanking the DNA sequence that is to be amplified. The primers hybridize to opposite strands of the double-stranded DNA fragment after it has been denatured, and are oriented so that DNA synthesis by the DNA polymerase proceeds through the region between the two primers, with the primer sequences being extended by the sequential incorporation of nucleotides by the polymerase. The extension reactions create two double-stranded target regions, each of which can again be denatured ready for a second cycle of hybridisation and extension. The third cycle produces two double-stranded molecules that comprise precisely the target region in double-stranded form. By repeated cycles of heat denaturation, primer hybridisation, and extension, there follows a rapid exponential accumulation of the specific target fragment of DNA. Traditionally, this method is performed in solution and the amplified target nucleic acid fragment purified from solution by methods well known in the art, for example by gel electrophoresis.
More recently, however, methods have been disclosed which use one primer grafted to a surface in conjunction with free primers in solution. These methods allow the simultaneous amplification and attachment of a PCR product onto the surface (Oroskar, A. A. et al., Clinical Chemistry 42:1547 (1996)).
WO96/04404 and WO98/36094 (Mosaic Technologies, Inc. et al) discloses a method of detection of a target nucleic acid in a sample which potentially contains the target nucleic acid, The method involves the induction of a PCR based amplification of the target nucleic acid only when the target nucleic acid is present in the sample being tested. For the amplification of the target sequence, both primers are attached to a solid support, which results in the amplified target nucleic acid sequences also being attached to the solid support. The amplification technique disclosed in this document is sometimes referred to as the “bridge amplification” technique. In this technique the two primers are, as for conventional PCR, specifically designed so that they flank the particular target DNA sequence to be amplified. Thus, if the particular target nucleic acid is present in the sample, the target nucleic acid can hybridise to the primers and be amplified by PCR. The first step in this PCR amplification process is the hybridisation of the target nucleic acid to the first specific primer attached to the support (“primer 1”). A first amplification product, which is complementary to the target nucleic acid, is then formed by extension of the primer 1 sequence. On subjecting the support to denaturation conditions the target nucleic acid is released and can then participate in further hybridisation reactions with other primer 1 sequences which may be attached to the support. The first amplification product which is attached to the support, may then hybridise with the second specific primer (“primer 2”) attached to the support and a second amplification product comprising a nucleic acid sequence complementary to the first amplification product can be formed by extension of the primer 2 sequence and is also attached to the support. Thus, the target nucleic acid and the first and second amplification products are capable of participating in a plurality of hybridisation and extension processes, limited only by the initial presence of the target nucleic acid and the number of primer 1 and primer 2 sequences initially present and the result is a number of copies of the target sequence attached to the surface.
Since, on carrying out this process, amplification products are only formed if the target nucleic acid is present, monitoring the support for the presence or absence of one or more amplification products is indicative of the presence or absence of a specific target sequence.
The Mosaic technique can be used to achieve an amount of multiplexing in that several different target nucleic acid sequences can be amplified simultaneously by arraying different sets of first and second primers, specific for each different target nucleic acid sequence, on different regions of the solid support.
The disadvantage of the Mosaic process is that, as the first and second primer sequences have to be specific for each target nucleic acid to be amplified, it can only be used to amplify known sequences. In addition, the throughput is limited by the number of different sets of specific primers and subsequently amplified target nucleic acid molecules which can be arrayed in distinct regions of a given solid support and the time taken to array the nucleic acids in distinct regions. Also, the Mosaic process requires that 2 different primers are homogeneously attached by the 5′ end to the support within the distinct region where the amplification product is formed. This cannot be achieved with presently available DNA chip manufacturing technology and has to be achieved by some means of sample dispensing. Thus, the density that can be achieved by this approach has the same limitation as other classical arraying technologies. A further limitation is the speed of monitoring the individual distinct regions of the support for the presence or absence of the amplified target nucleic acids.
Arraying of DNA samples is classically performed on membranes (e.g., nylon or nitro-cellulose membranes). The use of suitable robotics (e.g., Q-bot™, Genetix Ltd, Dorset BH23 3TG UK) means that it is possible to obtain a density of up to 10 samples/mm2. In such methods, the DNA is covalently linked to a membrane by physicochemical means (e.g., UV irradiation) and the arraying of large DNA molecules (e.g. molecules over 100 nucleotides long) as well as smaller DNA molecules such as oligonucleotide primers is possible.
Other techniques are known whereby higher density arrays of oligonucleotides can be obtained. For example, approaches based on pre-arrayed glass slides wherein arrays of reactive areas are obtained by ink-jet technology (Blanchard, A. P. and L. Hood, Microbial and Comparative Genomics, 1:225 (1996)) or arrays of reactive polyacrylamide gels (Yershov, G. et al., Proceedings of the National Academy of Science, USA, 93:4913–4918 (1996)) allow in theory the arraying of up to 100 samples/mm2.
Higher sample densities still are achievable by the use of DNA chips (Fodor, S. P. A. et al., Science 251:767(1991)). Currently, chips with 625 oligonucleotide probes/mm2 are used in molecular biology techniques (Lockhart, D. J. et al., Nature Biotechnology 14:1675 (1996)). Probe densities of up to 250 000 samples/cm2 (2500/mm2) are claimed to be achievable (Chee, M. et al., Science 274:610 (1996)). However, at present up to 132000 different oligonucleotides can be arrayed on a single chips of approximately 2.5 cm2. Importantly, these chips are manufactured in such a way so that the 3′OH end of the oligonucleotide is attached to the solid surface. This means that oligonucleotides attached to chips in such a way cannot be used as primers in a PCR amplification reaction.
Importantly, when PCR products are linked to the vessel in which PCR amplification takes place, the density of the resultant array of PCR products is limited by the available vessel. Currently available vessels are only in 96 well microtiter plate format. These allow only around 0.02 samples of PCR products/mm2 of surface to be obtained.
For example, using the commercially available Nucleolink™ system (Nunc A/S, Roskilde, Denmark) it is possible to achieve simultaneous amplification and arraying of samples at a density of 0.02 samples/mm2 in wells on the surface of which oligonucleotide primers have been grafted. However, technical problems mean that it is unlikely that a significant increase in this sample density will be achieved with this approach.
Thus, it can be seen that in order to increase throughput there is a need in the art for new methods of nucleic acid amplification which allow the simultaneous amplification and array of nucleic acid samples at a higher density, and furthermore, allows the monitoring of samples at a faster rate, preferably in parallel.
In addition, it is apparent that there is a need in the art for new methods of sequencing which allow large numbers of samples to be processed and sequenced in parallel, i.e. there is a need for methods of sequencing which allow significant multiplexing of the process. Significant multiplexing of the sequencing process would in turn lead to a higher throughput than that achievable with the methods of sequencing known in the art. Such new methods would be even more desirable if they could achieve such high throughput sequencing at a reasonable cost and with less labour intensiveness than conventional sequencing techniques.