1. Field of the Invention
The present invention relates to sequencing of DNA and more particularly to the sequencing of the entire human genome.
2. Description of the Background
A human being has 23 pairs of chromosomes consisting of a total of about 100,000 genes. The human genome consists of those genes. A single gene which is defective may cause an inheritable disease, such as Huntington's disease, Tay-Sachs disease or cystic fibrosis. The human chromosomes consist of large organic linear molecules of double-strand DNA (deoxyribonucleic acid) with a total length of about 3.3 billion "base pairs". The base pairs are the chemicals that encode information along DNA. A typical gene may have about 30,000 base pairs. By correlating the inheritance of a "marker" (a distinctive segment of DNA) with the inheritance of a disease, one can find a mutant (abnormal) gene to within one or two million base pairs. This opens the way to clone the DNA segment, test is activity, follow its inheritance, and diagnose carriers and future disease victims.
The mapping of the human genome is to accurately determine the location and composition of each of the 3.3 billion bases. The complexity and large scale of such a mapping has placed it, in terms of cost, effort and scientific potential of such projects, as one of the largest, and most important projects of the 1990's and beyond.
The problem of DNA sequence analysis is that of determining the order of the four bases on the DNA strands. The present status of techniques for determining such sequences is described in some detail in an article by Lloyd M. Smith published in the American Biotechnology Laboratory, Volume 7, Number 5, May 1989, pp 10-17. Since the early 1970's, two methods have been developed for the determination of DNA sequence: (1) the enzymatic method, developed by Sanger and Coulson; and (2) the chemical degradation method, developed by Maxam and Gilbert. Both of these techniques are based on similar principals, and employ gel electrophoresis to separate DNA fragments of different lengths with high resolution. On these gels it is thus possible to separate a DNA fragment 600 bases in length from one 601 bases in length.
The two sequencing methods differ in the techniques employed to produce the DNA fragments, but are otherwise similar. In the Maxam-Gilbert method, four different base-specific reactions are performed on portions of the DNA molecules to be sequenced, to produce four sets of radiolabeled DNA fragments. These four fragment sets are each loaded in adjacent lanes of a polyacrylamide slab gel, and are separated by electrophoresis. Autoradiographic imaging of the pattern of the radiolabeled DNA bands in the gel reveals the relative size, corresponding to band mobilities, of the fragments in each lane, and the DNA sequence is deduced from this pattern.
At least one of these two techniques is employed in essentially every laboratory concerned with molecular biology, and together they have been employed to sequence more than 26 million bases of DNA. Currently a skilled biologist can produce about 30,000 bases of finished DNA sequence per year under ideal conditions. With presently available equipment and trained personnel, sequencing the human genone would require about 100 years of total effort if no other sequencing projects were done. While very useful, the present sequencing methods are extremely tedious and expensive, yet require the services of highly skilled scientists. Moreover, these methods utilize hazardous chemicals and radioactive isotopes, which have inhibited their consideration and further development. Large scale sequencing projects, as that of the human genome, thus appear to be impractical using these well-established techniques.
In addition to being slow, the present DNA sequencing techniques involve a large number of cumbersome handling steps which are difficult to automate. Recent improvements include replacing the radioactive labels with fluorescent tags. These developments have improved the speed of the process and have removed some of the tedious manual steps, although present technology continues to employ the relatively slow gel electrophoresis technique for separating the DNA fragments.
Mass Spectrometry is a well known analytical technique which can provide fast and accurate molecular weight information on relatively complex mixtures of organic molecules. Mass spectrometry has historically had neither the sensitivity nor resolution to be useful for analyzing mixtures at high mass. A series of articles in 1988 by Hillenkamp and Karas do suggest that large organic molecules of about 10,000 to 100,000 Daltons may be analyzed in a time of flight mass spectrometer, although resolution at lower molecular weights is not as sharp as conventional magnetic field mass spectrometry. Moreover, the Hillenkamp and Karas technique is very time-consuming, and requires complex and costly instrumentation.