1. Field of the Invention
The present invention generally relates to the field of molecular biology. The invention particularly provides novel methods and compositions to enable highly efficient sequencing of nucleic acid molecules. The methods of the invention are suitable for sequencing long nucleic acid molecules, including chromosomes and RNA, without cloning or subcloning steps.
2. Description of the Related Art
Nucleic acid sequencing forms an integral part of scientific progress today. Determining the sequence, i.e. the primary structure, of nucleic acid molecules and segments is important in regard to individual projects investigating a range of particular target areas. Information gained from sequencing impacts science, medicine, agriculture and all areas of biotechnology. Nucleic acid sequencing is, of course, vital to the human genome project and other large-scale undertakings, the aim of which is to further our understanding of evolution and the function of organisms and to provide an insight into the causes of various disease states.
The utility of nucleic acid sequencing is evident, for example, the Human Genome Project (HGP), a multinational effort devoted to sequencing the entire human genome, is in progress at various centers. However, progress in this area is generally both slow and costly. Nucleic acid sequencing is usually determined on polyacrylamide gels that separate DNA fragments in the range of 1 to 500 bp, differing in length by one nucleotide. The actual determination of the sequence, i.e., the order of the individual A, G, C and T nucleotides may be achieved in two ways. Firstly, using the Maxam and Gilbert method of chemically degrading the DNA fragment at specific nucleotides (Maxam & Gilbert, 1977), or secondly, using the dideoxy chain termination sequencing method described by Sanger and colleagues (Sanger et al., 1977). Both methods are time-consuming and laborious.
More recently, other methods of nucleic acid sequencing have been proposed that do not employ an electrophoresis step, these methods may be collectively termed Sequencing By Hybridization or SBH (Drmanac et al., 1991; Cantor et al., 1992; Drmanac & Crkvenjakov, U.S. Pat. No. 5,202,231). Development of certain of these methods has given rise to new solid support type sequencing tools known as sequencing chips. The utility of SBH in general is evidenced by the fact that U.S. Patents have been granted on this technology. However, although SBH has the potential for increasing the speed with which nucleic acids can be sequenced, all current SBH methods still suffer from several drawbacks.
SBH can be conducted in two basic ways, often-referred to as Format 1 and Format 2 (Cantor et al., 1992). In Format 1, oligonucleotides of unknown sequence, generally of about 100-1000 nucleotides in length, are arrayed on a solid support or filter so that the unknown samples themselves are immobilized (Strezoska et al., 1991; Drmanac & Crkvenjakov, U.S. Pat. No. 5,202,231). Replicas of the array are then interrogated by hybridization with sets of labeled probes of about 6 to 8 residues in length. In Format 2, a sequencing chip is formed from an array of oligonucleotides with known sequences of about 6 to 8 residues in length (Southern, WO 89/10977; Khrapko et al., 1991; Southern et al., 1992). The nucleic acids of unknown sequence are then labeled and allowed to hybridize to the immobilized oligos.
Unfortunately, both of these SBH formats have several limitations, particularly the requirement for prior DNA cloning steps. In Format 1, other significant problems include attaching the various nucleic acid pieces to be sequenced to the solid surface support or preparing a large set of longer probes. In Format 2, major problems include labelling the nucleic acids of unknown sequence, high noise to signal ratios that generally result, and the fact that only short sequences can be determined. Further problems of Format 2 include the secondary structure formation that prevents access to some targets and the different conditions that are necessary for probes with different GC contents. Therefore, the art would clearly benefit from a new procedure for nucleic acid sequencing, and particularly, one that avoids the tedious processes of cloning and/or subcloning.