1. Field of the Invention
The present invention relates to a method for determining a base sequence of a DNA which is one type of nucleic acid, and more particularly to a method for displaying a base sequence trace (waveform) of a DNA fragment for use in a DNA base sequencing using a DNA sequencer.
2. Description of the Related Art
A constituent of a nucleic acid is a nucleotide, which is composed of a base, pentose, and phosphoric acid. The phosphoric acid and a nucleoside combine to form the nucleotide. Nucleosides are cross-linked by the phosphoric acid, so that a DNA (deoxyribonucleic acid) and an RNA (ribonucleic acid), which are polymers, are generated.
The base forming the nucleic acid comprises two types, a purine and a pyrimidine. The purine comprises an adenine A and a guanine G, while the pyrimidine comprises a cytosine C and a thymine T.
The DNA has a polynucleotide-chain structure, in which adenine A, guanine G, cytosine C, and thymine T are aligned as a string. If DNAs were extracted from one chromosome of a human cell and linked in sequence, they would extend for about one meter, in which three billion bases remain in sequence.
Accordingly, determining a sequence of the four bases adenine A, guanine G, cytosine C, and the thymine T allows genetic information to be analyzed. The DNA sequencing technology for determining a base sequence progresses as techniques in other fields advance. Its progress correlates closely to progress made in technical fields such as the discoveries of restriction enzymes and enzymes related to nucleic acid, DNA cloning, nucleo-chemistry, etc.
In recent years, computer technology is frequently applied to the DNA sequencing. Since the computer technology enables accumulation and entry of huge amounts of data beyond the capability of a human being, a computer is used as an essential tool for determining a base sequence.
As described above, the DNA has a structure of a primary base sequence linked in the form of a chain. The DNA chain has directionality. That is, a base sequence ATGCACGA.fwdarw. is different from a base sequence ATGCACGA.rarw. (that is, AGCACGTA.fwdarw.).
Both ends of the DNA chains are named. The end where a hydroxyl group is linked to a sugar at a location 3' is called the 3' end, while the other end where a phosphoric acid group is linked to a sugar at a location 5' is called the 5' end. Normally, a DNA chain is described so that the 5' and 3' ends are arranged at the left and right sides respectively.
The DNA exists in a double-stranded state where two complementary base sequences in different directions are united. There is a definite relationship between the two complementary base sequences which face each other. That is, the adenine A only faces the thymine T, while the guanine G only faces the cytosine C. An example of a double-stranded DNA is given below.
ATGCATGCTAGCTAGCT.fwdarw.("a" strand) PA1 TACGTACGATCGATCGA.rarw.("b" strand) PA1 AGCTAGCTAGCATGCAT.fwdarw.
As shown above, the "a" strand pairs with the "b" strand. The "b" strand is complementary to the "a" strand. Accordingly, the "b" strand can be also represented as follows:
The DNA is genetically defined by making a pair of two complementary base sequences. If one of the two complementary base sequences is determined, the other of the two can be determined. This means that the base sequence of the DNA can be determined.
For a DNA sequencer for automatically reading a base sequence of a DNA, the dideoxyn method or the Sanger method is used to determine a base sequence. Normally, when DNA synthesis is performed using a portion of one of the complementary double strands of a DNA as a primer for initiating the DNA synthesis, adding a dideoxynucleotide halts the DNA synthesis. As a result, fragments of the DNA having a variety of lengths can be obtained. Therefore, by adding the dideoxynucleotide corresponding to each of the bases G, A, T, and C at the time of DNA synthetic reaction, the DNA fragments having a variety of lengths whose chains are cleaved at each location of each of the bases, can be obtained.
FIG. 1 is a schematic diagram showing a process for generating DNA fragments cleaved at locations of one particular nucleotide, the adenine A in this case. As shown in this figure, the chemical process for removing one nucleotide, that is, the adenine A from a DNA 1 whose 5' end is labelled with .sup.32 p, is performed. As a result, the DNA 1 is separated into radioactive labelled fragment 2 having a phosphoric acid group at the 5' end on the left side of the figure and fragments 3 which are non-labelled. Then, these fragments are isolated by gel electrophoresis. The radioactive fragments 2 are detected at locations respectively corresponding to the lengths of the fragments (or molecular weights) by autoradiography.
With the DNA sequencer, DNA fragments generated by a reaction of the dideoxyn method are fluorescence-marked. The DNA fragments having a variety of lengths of fluorescence-marked chains are isolated by a gel electrophoresis. For the DNA fragments electrophoresed in a gel, their fluorescent pigment is excitation-radiated at a certain location on the gel by laser irradiation, and detected by an optical detector. By detecting the fluorescence continuously and simultaneously with the electrophoresis, data of the electrophoretic patterns of DNA fragments corresponding to each of the bases G, A, T, and C, can be obtained. The data thus obtained is analyzed by a computer, and converted into base sequence data.
Output data from the DNA sequencer includes a DNA base sequence itself, and trace data (waveform data) used to determine a base sequence. The trace data corresponds to data of a gel electrophoretic pattern, and a location of a peak in each of the traces (waveforms) of the bases G, A, T, and C corresponds to a location where a corresponding base exists.
However, since the number of bases included in a base sequence of a DNA is generally very large, it is difficult to determine the whole of the base sequence at one time using the DNA sequencer. Accordingly, by separating a DNA to be determined into a plurality of fragments, determining a base sequence of each of the plurality of fragments, and linking the base sequences, the entire base sequence is determined. For the fragmentation, the DNA is fragmented by overlapping both ends of each of the fragments, and a base sequence of each of the fragments is obtained.
For the process for determining a base sequence using the DNA sequencer, the number of bases read at one time is limited, as described above. Furthermore, the contents of read sequence data may be quite ambiguous depending on the accuracy of experiments conducted by using gel electrophoresis.
FIG. 2 shows an output example of trace data obtained from a DNA sequencer. In each of the graphs shown in this figure, a vertical axis indicates fluorescent intensity, while numerical values on a horizonal axis indicate base numbers in a DNA sequence. Since traces can be enlarged for display depending on need as shown in the four graphs in this figure, the base sequence can be read corresponding to the peak locations of the respective traces in these graphs.
For a DNA whose base sequence is desired to be determined as described above, an editing operation such as enlarging a fragment sequence, linking fragment sequences, removing a base which is difficult to be identified, inserting a base, depending on need, etc., is performed to assemble a base sequence. In this case, it is desirable that the editing and assembling operations be performed more accurately and more quickly to obtain a desired base sequence.
When fragment data read by the DNA sequencer is linked or edited in order to assemble a base sequence, it is often the case that a character sequence is extracted from the data read by the sequencer to perform linking and editing operations, and at the same time, trace data is referenced, depending on need. Conventionally, only trace data corresponding to a fragment is displayed for referencing the trace data. Such a display does not enable a study by making a comparison between traces. As described above, peak intervals of trace data may differ depending on experimental data due to non-uniformity of quality of gels used in electrophoresis, slight differences in experimental conditions, etc. Therefore, as long as traces are simply displayed, there is a difficulty in finding visual correspondence between portions of traces to be compared, and the display is not helpful for accurately assembling a base sequence.
Furthermore, as a molecular weight of a base increases in gel electrophoresis, a travel distance of the base becomes shorter, in a conventional method. As a result, base intervals corresponding to a trace become irregular.
FIG. 3 shows a graph where such irregularities of base intervals exist. For example, the interval between base numbers 100 and 200 is different from that between base numbers 600 and 700 in this figure. That is, it indicates that the base intervals are not regular. Due to slight differences in experimental conditions, locations of traces corresponding to the same base number are different. Accordingly, with an editing operation using such a graph, a simple comparison between traces cannot easily be made.