There are two methods in common use to sequence DNA: the chemical degradation method, e.g., Maxam et al., (1977) and the chain-termination method, e.g., Sanger et al., (1977). Efforts to improve DNA sequencing efficiency have resulted in numerous improvements in the chain-termination method. Automation of many steps in the process has produced significant improvements in sequencing throughput. Nevertheless, each template still is sequenced one at a time.
Attempts have been made to introduce some parallel processing steps into the sequencing method. For example Church (1990) and Church et al. (1992) teach a strategy in which multiple templates are fragmented in a single tube by either the chain-termination or chemical-degradation sequencing methods. The fragments are separated on a gel and transferred to a solid membrane. Each template carries a unique tag and the fragments are visualized by hybridization with a unique oligonucleotide probe specific to each tag. The pattern of the fragments that hybridize to one specific oligonucleotide probe represent the sequence information from one template. Removal of the first oligonucleotide probe followed by hybridization of a second oligonucleotide probe reveals the sequence pattern from a different template. This method is limited by the requirement to maintain the pattern of fragments in order to extract the sequence information. Therefore, only one sequence can be read at a time; that is, this step in the method is sequential rather than parallel. There are inherent time constraints produced by this sequential step. In addition, the number of times any membrane can be “stripped” and reprobed is limited. For these reasons, the application of the method is limited in practice to collections of fewer than 50 templates.
Other methods are described in the art which attempt to introduce parallelism into different stages of the sequencing protocol. Van Ness et al. (1997) describe the use of mass tags that can be detected by mass spectrometry. Different tags are attached to the 5′-end of a sequencing primer. Each tagged primer is used to sequence a different template by the chain-termination method. The different reactions are pooled and fractionated by size (i.e. sequencing products are collected from the end of a capillary electrophoresis device). The tags present in each fraction are assayed by mass spectrometry. This information is deconvoluted to reproduce the “sequence ladders” of the different templates. The method is limited by the number of different tags that can be synthesized. More importantly, the method is not parallel until the sequencing reactions are pooled.
A variation of the Van Ness method is described by Wong (1999). He replaces the chemical tags attached to the 5′-end of a primer with nucleic acid tags. Again, individual sequencing reactions are pooled and fractionated by size. Instead of detection by mass spectrometry, the tags in each fraction are designed to be amplified and labeled in vitro (i.e. PCR) followed by hybridization to an array of oligonucleotides. Individual locations in the array will hybridize to different tags. A positive hybridization signal indicates the tag is present in the fraction. This information is deconvoluted to reveal the sequence ladders of the different templates. The possible number of different tags attached to the sequencing primer is far greater with Wong's method than the Van Ness method. However, Wong still describes a method that is not parallel until the sequencing reactions are pooled. Consequently, much of the labor associated with traditional sequencing protocols still is present in Wong's method. DNA must be prepared from individual clones, and separate sequencing reactions must be performed on each template. In a second embodiment, Wong attempts to introduce some parallelism into these steps. He attaches the tags to several different sequencing primers. The different primers hybridize to different vectors. Instead of sequencing one clone at a time, he makes separate libraries in each vector, pools one clone from each library and sequences them with the pooled primers. The sequencing products from different pools are then combined and fractionated by size. Each clone still requires its own uniquely tagged primer, but fewer sequencing reactions are needed. In theory, this same strategy can be applied to the Van Ness mass-tag method, as described by Schmidt et al. (1999). Presumably, the strategy will work for very small pools of primers, but as the collection of primers and vectors increases, mispriming events and failed sequences will predominate. In addition, single clones still are handled one at a time so considerable resources must be dedicated simply to producing, cataloging and storing the sequencing templates.
Rabani (1996 and 1997) describes a sequencing method that employs the same tagged sequencing vectors used by Church (1990). A pool of templates with substantially different tags is sequenced with one primer as described in the Church patent. A label is incorporated into either the primer or the chain-terminator. The sequencing products are fractionated by size and immediately hybridized to an array of oligonucleotides (analogous to the array in Wong's method). Detection of the label at a particular location in the array indicates the presence of that tag in the fraction. The sequence ladders are deconvoluted as above. Though parallel at each step, in practice only a small number of samples can be pooled. A small amount of labeled material is available in each fraction for hybridization to the array. This material will determine the rate of hybridization and limits of detection. A very sensitive oligonucleotide array can detect about 0.1 femtomoles of a complementary polynucleotide, see Lockhart et al. (1996). Assuming each tag is present in about 1000 bands of a sequencing ladder, then at least 0.1 picomoles of any tagged clone must be present in the pool before sequencing. A typical sequencing reaction uses about 0.5 picomoles DNA. This calculation suggests a starting pool of about five clones may be sequenced according to Rabani's method.
Thus, there is a need in the art for a highly parallel sequencing method that is not limited by any sequential “bottlenecks” described above. The sequencing method would result in significant improvements in sequencing throughput and substantial reductions in the cost of sequencing.
To sequence very large genomes, the DNA first must be broken down into smaller, more manageable clones. The determination of the overlap relationships of these smaller clones is needed to simplify the reconstruction of the entire sequence. The method most frequently used is “Sequence Tagged Site” (STS) content mapping. This method involves finding many small regions of single copy DNA (i.e., STS's) and determining which clones contain the same STS's. Two clones that contain the same STS must overlap. Detection of the STS is achieved by amplifying pools of clones with the polymerase chain reaction. This mapping process is very expensive and time consuming.
Ultimately, the physical mapping and sequencing of organisms is designed to hasten the discovery of gene function. A general step in this process is to observe the phenotype of the null mutant. Through “reverse genetics” it is possible to “knockout” the function of a cloned gene to produce the null phenotype. Usually, gene knockouts are produced one at a time at great expense by introducing foreign DNA into the gene. Even efforts to apply reverse genetics to many cloned genes simply scale up the serial one-by-one approach.
For these reasons, there is a need in the art for a method that introduces massive parallelism into the processes of sequencing, physical mapping and the production of gene knockouts. The present invention provides these and other advantages, as described in greater detail below.