Through the advances in DNA technology that occurred in the 1980's, it is now possible to map the human genome, identify all the genes it contains, and study their functions. Once all the genes are known, the analysis of how they cooperate will be possible, leading to enormous advances in the understanding of diseases. Indeed, medicine in the next century can be expected to focus increasingly on DNA. The ability to synthesize, sequence, and perceive the meaning of DNA will be an essential tool in the clinical laboratory, the drug industry, and the research laboratory.
The development of detailed genetic maps of each of the 24 chromosomes will aid in the pursuit of specific disease genes. Physical maps of the genome are an essential intermediate step to obtaining the gene itself. Libraries of overlapping cloned DNA for which the position of each gene in the chromosome is known constitute appropriate physical maps. Once genes have been localized to individual clones, they will then be sequenced. As the wild-type sequences become available, this baseline information can be used in presymptomatic diagnosis of genetically-based diseases.
Key to the success of the process described above is the availability of rapid, accurate and affordable DNA sequencing techniques. One of the most widely acclaimed methods currently proposed for DNA analysis is a microchip that can be used for sequencing-by-hybridization (SBH) (Bains et al, J. Theor. Biol. 135:303 (1988); Drmanac et al, Genomics 4:114 (1989); Khrapko et al, FEBS Lett. 256:118 (1989)). In theory, a chip bearing a complete set of n-mer oligonucleotides would permit the sequencing of DNA by duplex formation with their Watson-Crick complements in a target DNA, with the only limitations being repeats of the same sequence and runs of identical bases longer than n.
In theory, the SBH method could be applied either with the probes immobilized and the target in solution (reverse blot) or vice versa. In fact, the initial experiments toward demonstrating this technology have used both. Light-directed synthesis (U.S. Pat. No. 5,143,854; Fodor et al, Science 251:767 (1991)), however, which permits the chemistry of DNA synthesis to be conducted in parallel at thousands of locations, requires probe immobilization. The number of sequences prepared using this technology far exceeds the number of chemical reactions required. In fact, for light-directed DNA synthesis of oligomers of length "l", the number of sequences prepared is 4.sup.1 but the number of steps required is only 4.times.l.
Recently, Pease et al (Proc. Natl. Acad. Sci. USA 91:5022 (1994)) have reported the results of initial efforts in preparing DNA chips through light-directed synthesis and in using them for mock sequencing experiments. Using phosphoramidite chemistry modified by the inclusion of the MeNPOC photoremovable group, Pease and colleagues have prepared arrays of 256 octamers (4 mixed nucleotide positions flanked by two CG clamps). They find that fluorescently-labeled target DNA binds selectively to its complement within the array. Some single-base mismatches, however, show as much as 20% of the fluorescence hybridization signal of the perfect complement. This results from the fact that hybridization is dependent on the exact sequences of the probes, the hybridization conditions, and the location of the mismatches (fraying at the 5' end is common in mishybridization (Wood et al, Proc. Natl. Acad. Sci. USA 82:1585 (1985)). While mishybridization is readily manageable for targets that have only one complement within an array, it could make interpretation of the hundreds of hybridization spots that will be produced with full octamer or decamer arrays and Kb DNA targets very challenging.
A second novel method for high-throughput DNA sequencing has arisen based on reversible termination of primer extension. In accordance with this method, a DNA polymerase reaction is conducted with a primer, template, and four terminators that are conventional deoxynucleotides with a blocking group at the 3' end. No dNTPs are included. Only one blocked deoxynucleotide is incorporated based on the template/primer sequences and the fidelity of the polymerase. The identity of the incorporated terminators can then be determined by tagging them with differently-colored fluorophores. The blocking group is removed (under conditions that do not damage DNA) in order to provide free 3' end for another polymerase cycle. A reasonable strategy is the incorporation of the base specific color into the blocking group.
This method has a number of experimental pitfalls, the greatest being the reversibility of DNA polymerization in the presence of enzyme. This does not refer to the 5'.fwdarw.3' exonuclease activity that has been removed from many of the commercial polymerase preparations by engineering, but rather to the natural reversibility of chemical reactions. Under normal primer extension conditions, this reversal is invisible because the dNTPs that are present permit the degraded strand to be built back up. Even if there is only a small amount of reversal in each cycle, the accumulation of such errors becomes significant over the hundreds or thousands of cycles needed to sequence a template of significant length.
The present invention provides a new approach to the photochemical synthesis of nucleic acids and to the preparation of high quality arrays of oligomers that permit the rapid analysis of genes, including those wherein mutations result in disease. The invention also provides new photochemically removable protecting groups that can be used in the present approach to nucleic acid synthesis.