1. Field of the Invention
The present invention relates to a method and apparatus for determining the base sequence of a DNA or the like of biological or non-biological origin and, more particularly, to a so-called ultrahigh-speed DNA sequencer used to sequence the bases of DNA or the like at high speed. The sequencer consists mainly of a transmission electron microscope (TEM) for producing magnified visible images of DNA molecules and parts and involving specimen adjustment, TEM imaging, and image analysis.
2. Description of the Related Art
The prior art DNA base sequencing is a wet chemistry technique consisting principally of DNA length separation making use of electrophoresis. This technique is based on DNA cutting, fluorescence labeling, and reading of a separation pattern or electrophoresis pattern. All the sequencing techniques have been developed based on this key technique. In recent years, it was said that Celera Genomics, U.S., successfully sequenced 3×1010 DNA bases (human genome from one person) in one year. This achievement was made by operating 200 DNA sequencers each having tens of electrophoresis lanes in parallel. Although this achievement was accomplished by ultra-paralleling of machines of the same construction, the analysis speed of each electrophoresis lane is not so high, because the separation efficiency of electrophoresis is low. Today, it is considered that 104 bases at most can be sequenced with one lane per day. The efficiency of DNA electrophoresis is improved by shortening the electrophoresis distance and introducing microscopy. The target is ten times the present speed, i.e., 105 bases with one lane per day. However, it is considered that this is the limit of this method. This speed directly determines the cost of determining the base sequence. Currently, the cost is estimated to be about 10 yen per base. Accordingly, the present cost of sequencing the genome of a person is estimated to be 3×1010 bases×10 yen/base=3×1011 yen=30 billion yen (about 250 million dollars).
Under these circumstances, the present cost is so high that the genome information industry cannot emerge. It is essential to develop a method of reducing the cost by three to four orders of magnitude. In an attempt to achieve this, various methods have been proposed. One known method is to aim at a certain genome DNA, and its existence is probabilistically known like DNA chips that are already available on the market. Another known method uses a scanning probe microscope, especially a scanning tunneling microscope, for producing magnified images of DNA molecules. This permits one to read base sequences (for example, Biological Physics (in Japanese), Hiroyuki Tanaka et al., Vol. 40, No. 5, pp. 336-340, 2000). However, the former technique has the problem that the accuracy is low; intrinsically, it is not a technique for sequencing unknown DNAs. The latter method has the problem that the data throughput is low, which is a limitation of scanning. In this way, these methods do not excel the existing electrophoresis.
Retrospectively speaking, various procedures were attempted in the 1960s when the existing DNA sequencer was not yet invented by Gilbert and Sanger. Among them, electron microscopy was considered to be most promising because of its high spatial resolution. Researches were conducted in wide application ranging from biology to physics. However, it has been difficult, even using electron microscopy, to determine the double-helical DNA sequence. Computer simulation of TEM images of DNA reveals that the four bases, or adenine(A), thymine(T), guanine(G), and cytosine(C), cannot be distinguished from each other even at a spatial resolution of 0.05 nm, for example. In contrast with this, use of single-chain DNA yields many advantages. That is, i) the spacing (about 0.7 nm) between the successive bases when elongated is twice as wide as the spacing of double-helical DNA; ii) it is easy to label each base with a certain heavy atom; and iii) the method can be applied to RNA sequencing, as well as to DNA sequencing.
As a pioneer researcher making good use of these advantages, Evangelos N. Moudrianakis et al. attempted a method of labeling an organic substance (diazotized 2-amino-p-benzene disulfonic acid) with heavy uranium atoms by selectively bonding this organic substance to guanine(G) bases and labeling active bases (two sulfonic acids) of the organic substance with the heavy uranium atoms (Proc. Natl. Acad. Sci. USA, Vol. 53, pp. 564-571 (1965)). That is, if a TEM capable of resolving one uranium atom is used, base G can be identified from a pair of uranium atoms.
However, this method has not functioned as anticipated. One reason is that the guanine-specific bonding of organic substances has a problem. Another reason is that the TEM used at that time had not sufficient capability to identify a single uranium atom. It is considered that this poor capability arises not from low spatial resolution but from low image contrast; a single atom cannot be distinguished from noise coming from a carbon film or polymeric film that is a specimen support substrate, or background. It seems that the results were all artifacts.
As described thus far, the DNA sequencing speed will likely not exceed 105 bases per lane per day as long as a DNA sequencer based on the prior art electrophoresis is used. With the existing method, it takes about one year to read the genome of a person, and the cost is about 250 million dollars. It is required that the genome be read in one week at about 1 million dollars. That is, it is required that the genome be read at ultrahigh speed much more economically. This will permit the genome information industry to emerge. For this purpose, development of an epoch-making, ultrahigh-speed DNA sequencer is essential.
To determine the base sequence of DNA at ultrahigh speed, it is essential to develop a measuring method for extracting a large amount of primary data about the sequence at ultrahigh speed. The TEM is considered to be the means that satisfies this requirement best today, because the instrument provides a magnification of one million and offers two-dimensional images. However, in view of the history of failures occurring more than thirty years ago, novel means including elongation of single-chain DNA, labeling of bases on a single-chain DNA with specific heavy elements, a high-resolution and high-contrast TEM capable of discriminating heavy elements, and an image analysis system for determining DNA sequences are required.
We have variously discussed the foregoing items and completed a novel system for determining the sequence of bases by integrating the above-described methods and instruments. The novel system does not use electrophoresis but is based on direct observation through a TEM.