Technical Field
The present invention generally relates to the field of nucleic acid amplification, detection and sequencing. More specifically, the present invention relates to improved compositions, methods, apparatus, and kits for high-throughput nucleic acid amplification, detection and sequencing.
Description of the Related Art
High-throughput technologies have become a cornerstone in many areas of modern molecular biology, biotechnology and medicine. For example, efforts to rapidly, accurately, and economically determine gene expression levels (e.g., microarrays) and nucleic acid sequence (e.g., parallel sequencing) have intensified over the past few years. The information provided by such advances has furthered genome analyses for several plant and animal species, including humans, non-human primates and others, and has also assisted drug target discovery and validation, disease diagnosis and risk scoring, and the identification and characterization of multiple organisms.
A number of methods of nucleic acid sequencing are well known and documented in the art. The two most commonly used methods are the Maxam and Gilbert technique and the widely used Sanger sequencing technique.
In Sanger sequencing, each nucleic acid molecule to be sequenced is used as a template that is replicated in a reaction employing DNA polymerase as a catalytic enzyme, and deoxynucleotide triphosphates (dNTPs) ATP, CTP, GTP and TTP as precursors to be incorporated into a DNA complement of the template and dideoxynucleotide triphosphates of adenine (A), guanine (G), cytosine (C) and thymidine (T) (ddNTPs) as chain terminators. The DNA polymerase can incorporate both dNTPs and ddNTPs into the growing DNA strand. The incorporation of a ddNTP, however, terminates the nucleic acid chain extension because the ddNTP lacks a 3′ hydroxyl group and thus is no longer a substrate for further chain elongation. For example, in a particular template-directed Sanger sequencing reaction in which only one type of ddNTP (e.g., ddCTP) is present, a mixture of nucleic acids of different lengths is produced, all terminating with the same ddNTP (e.g., ddCTP). Typically, either separate reactions are set up for each of the four types of ddNTPs or the four ddNTPs are differentially labeled and used in a single reaction, and size distribution of the nucleic acid fragment products is analyzed by denaturing gel electrophoresis or by mass spectrometry. For example, each of the ddNTPs in the reaction mixture is labeled with a different fluorophore to enable detection of the fragments of different lengths.
The above described methods are disadvantageous because each nucleic acid to be sequenced has to be processed individually during the sequencing reaction. Gel electrophoresis is not well suited for large scale high throughput sequencing. It is cumbersome, labor intensive, and intrinsically slow, even when capillary gel electrophoresis is used. In addition, following electrophoretic separation of reaction products, the subsequent analysis of electrophoretograms for determination of the sequence is time-consuming and can generate equivocal results due to confounding artifacts. Mass spectrometry offers more promise for expediting sequence determination, but it is still at the prototype level, requires very expensive apparatus and labor-intensive instrument maintenance, and each sample must be analyzed individually.
More recently, nucleic acid sequencing methods based on solid-phase DNA chips and DNA hybridization have become available. Each of these methods is not without its shortcomings, however, because DNA chips have to be carefully designed, fastidiously manufactured, and subjected to rigorous quality control testing. These processes are lengthy and require significant expertise, which drives up the price of individual chips. Moreover, often the chips are not reusable and thus for each chip, only one nucleic acid sample (e.g., one patient to be diagnosed) can be processed at a time.
In many currently practiced techniques for nucleic acid sequence analysis, amplification of the nucleic acids of interest is a prerequisite step in order to obtain the nucleic acid in a quantity sufficient for analysis. Several methods of nucleic acid amplification are well known and documented in the art. For example, nucleic acids can be amplified by inserting a nucleic acid of interest into an expression vector construct. Such vectors can then be introduced into suitable biological host cells and the vector DNA, including the nucleic acid of interest, is amplified by culturing the biological host using well established protocols. However, such methods have the disadvantage of being time consuming, labor intensive, and difficult to automate.
The technique of DNA amplification by the polymerase chain reaction (PCR) is a widely used and well documented method. In PCR, a target nucleic acid fragment of interest can be amplified using one or two short oligonucleotide sequences (usually referred to as primers) that specifically hybridize (e.g., by Watson-Crick base-pairing) to known sequences flanking the DNA sequence that is to be amplified. By repeated cycles of heat denaturation, primer hybridization, and extension, the target nucleic acid is exponentially amplified. Traditionally, this method is performed in solution and the amplified target nucleic acid fragment is purified from solution by methods well known in the art, for example, by gel electrophoresis.
More recently, nucleic acid amplification methods have been disclosed which employ an immobilized primer grafted to a solid-phase surface in conjunction with free primers in solution. These methods allow the simultaneous amplification and attachment of a PCR product onto the surface.
Some known methods of nucleic acid analysis involve PCR-based amplification of a target nucleic acid only when the target nucleic acid is present in the sample being tested. For the amplification of the target sequence, primers can be attached to a solid support, which results in the amplified target nucleic acid sequences also being attached to the solid support. This amplification technique is often referred to as the “bridge amplification” technique. In this technique, conventional PCR primers can be designed to hybridize specifically to polynucleotide sequences flanking the particular target nucleic acid sequence to be amplified. If the target nucleic acid is present in the sample, it hybridizes to the primers and is amplified by PCR. The first step in this PCR amplification process is thus the hybridization of the target nucleic acid to a first specific primer attached to the support (“primer 1”). A first amplification product, which is complementary to the target nucleic acid, is then formed by extension of the primer 1 sequence. Denaturation conditions release the target nucleic acid, which can then either participate in further hybridization reactions with other primer 1 sequences attached to the support or be removed from the solid support. The first amplification product, which is attached to the support, can then hybridize with a second specific primer (“primer 2”) attached to the support and a second amplification product comprising an attached nucleic acid sequence complementary to the first amplification product can be formed by extension of the primer 2 sequence. Thus, the target nucleic acid and the first and second amplification products are capable of participating in a plurality of hybridization and extension reactions, which are limited by the initial presence or absence of the target nucleic acid and by the number of primer 1 and primer 2 sequences initially attached to the solid support.
A bridge amplification technique can be used to amplify several different target nucleic acid sequences simultaneously by arraying different sets of first and second primers, each set being specific for a different target nucleic acid sequence, on different or overlapping regions of the solid support. A further application of the bridge amplification technique is to amplify fragments using immobilized primers which are complementary to a universal sequence located at the ends of a collection of templates of different sequence. Thus the primer 1 and primer 2 sequences may be complementary to a nucleic acid sample with known ends, for example, ends that have been attached to the sample by ligation of a universal adapter sequence. The templates may be applied to the solid support as single strands where the ends of each strand are complementary to, and hybridize with, the primer 1 and/or primer 2 sequences. Primer 1 can be extended to form an extension product where the end of the extension product is complementary to the primer 2 sequence. Likewise, primer 2 can be extended to form an extension product where the end of the extension product is complementary to the primer 1 sequence. The hybridized targets can be denatured and removed from the support. The first extension products can be hybridized with the primer 1 and primer 2 sequences and extended to form second extension products wherein the second extension products are complementary copies of the first extension products. The first and second extension products can be amplified via cycles of denaturation, hybridization and extension to produce multiple copies of each of the first and second extension products. The amplification may give rise to a population of nucleic acid clusters attached to the support where each cluster is derived from a single template, but adjacent clusters on the solid support contain different template sequences.
In the era of high-throughput technology, amassing the highest yield of interpretable data at the lowest cost per effort remains a significant challenge. Cluster-based methods of nucleic acid sequencing, such as those that utilize bridge amplification for cluster formation, have made a valuable contribution toward the goal of increasing the throughput of nucleic acid sequencing. These cluster-based methods rely on sequencing a dense population of nucleic acids immobilized on a solid support, and typically involve the use of image analysis software to deconvolute optical signals generated in the course of simultaneously sequencing multiple clusters situated at distinct locations on a solid support.
However, such solid-phase nucleic acid cluster-based sequencing technologies still face considerable obstacles that limit the amount of throughput that can be achieved. For example, in cluster-based sequencing methods, determining the nucleic acid sequences of two or more clusters that are physically too close to one another to be resolved spatially, or that in fact physically overlap on the solid support, can pose an obstacle. For example, current image analysis software can require valuable time and computational resources for determining from which of two overlapping clusters an optical signal has emanated. As a consequence, compromises are inevitable for a variety of detection platforms with respect to the quantity and/or quality of nucleic acid sequence information that can be obtained.
High density nucleic acid cluster-based genomics methods extend to other areas of genome analysis as well. For example, nucleic acid cluster-based genomics can be used in sequencing applications, diagnostics and screening, gene expression analysis, epigenetic analysis, genetic analysis of polymorphisms, and the like. Each of these nucleic acid cluster-based genomics technologies, too, is limited when there is an inability to resolve data generated from closely proximate or spatially overlapping nucleic acid clusters.
Clearly there remains a need for increasing the quality and quantity of nucleic acid sequencing data that can be obtained rapidly and cost-effectively for a wide variety of uses, including for genomics (e.g., for genome characterization of any and all animal, plant, microbial or other biological species or populations), pharmacogenomics, transcriptomics, diagnostics, prognostics, biomedical risk assessment, clinical and research genetics, personalized medicine, drug efficacy and drug interactions assessments, veterinary medicine, agriculture, evolutionary and biodiversity studies, aquaculture, forestry, oceanography, ecological and environmental management, and other purposes. The presently disclosed invention embodiments provide compositions and methods that address these and similar needs, including compositions and methods to increase the level of throughput in high-throughput nucleic acid sequencing technologies, and offer other related advantages. These and other aspects of the present invention will become apparent upon reference to the following detailed description.