1. Field of the Invention
The present invention relates to a nucleic-acid base sequence determining method and an inspecting system therefor for determining the nucleic-acid base sequence by interpreting fluorescent-light intensity waveform data obtained from a nucleic-acid sample.
2. Description of the Related Art
In recent years, research for deciphering the human genetic information (i.e., human genome) has been enthusiastically carried out all over the world. In accompaniment therewith, it is now being clarified that the various diseases of humans are caused by mutations in the nucleic-acid (i.e., DNA) base sequence. Among individual humans, as is the case with their physical characteristics, the nucleic-acid base sequences differ from each other in many of the locations thereof. These differences in the base sequences are referred to as “polymorphisms”. A polymorphism is defined as a base sequence in which certain base mutations exist with a frequency that is larger than 1% of the population. Examples of the polymorphisms are as follows: Polymorphisms where 1-base has been replaced by another base (i.e., Single Nucleotide Polymorphisms: SNPs), polymorphisms where 1-base to several tens of bases have been deleted or inserted, and polymorphisms where the repetition number of a location in which a 2-base to several-tens-of-base genetic base sequence is repeated differs from each other among individual humans.
It has been estimated that, within the 3-billion bases of the human genome, the mutations exist in a proportion of about 1 mutation every 500 to 1000 bases. Accordingly, 3-million or more 1-base mutation pairs (i.e., the SNPs) are considered to exist. The gene diagnosis method (i.e., the DNA marker method) where the SNPs like this or the like are employed as the marker is expected to be utilized for the search-for of disease genes, the judgement on disease sensitivities, and the development of proper medicines (i.e., the tailor-made therapy).
A lot of methods for detecting the polymorphisms like this at a low-cost and with ease have been developed up to the present. In whatever methods, however, a comparison is made between the sizes of nucleic-acid fragments thereby to recognize the mutations indirectly. Consequently, there are many cases where, as the final confirmation, a high-reliability base sequence determination is performed which allows the direct detection of the mutation locations. Conventionally, in order to perform this base sequence determination, there has been widely used the DNA sequencing method that results from combining the fluorescent-light labeling technique to the nucleic-acid fragments, the high-sensitivity fluorescent-light detecting technique, and the high-resolution gel electrophoresis technique.
In the above-described DNA sequencing method, at first, there is prepared a DNA whose base sequence is wished to be recognized (i.e., template DNA). Usually, the following nucleic-acid fragment is employed as the template DNA: A nucleic-acid fragment formed by incorporating a DNA whose base sequence is unknown into the plasmid (i.e., the DNA existing in the cytoplasm other than the nucleus within a cell of a bacteria or the like, and mainly having only replication staffing information), or a nucleic-acid fragment whose base sequence has been directly amplified by the polymerase chain reaction (: PCR) method. Next, the template DNA and a primer DNA (i.e., an oligonucleotide whose base sequence is complementary to the base sequence in a specific portion of the template DNA, and which corresponds to an oligonucleotide utilized as a one-side substance of the reaction when the PCR method is used) are mixed up in a solution inside a test tube. Moreover, the temperature is controlled so that the primer and the template will form a complementary double strand via hybridization (i.e., annealing).
Furthermore, the sequencing method proceeds to a step of replicating the DNA with this primer employed as a starting point. This replication is performed using the enzyme called “DNA polymerase” as the catalyst. The following 2 types of substances are poured into this reaction solution in a predetermined proportion, then mixed up in a predetermined concentration accordingly: dNTPs (: deoxy nucleotide triphosphates) needed for the synthesis of the DNA, i.e., monomers of the respective types of bases: adenine (A), cytosine (C), guanine (G), and thymine (T) (or, uracil (U)), and 4 types of ddNTPs (: dideoxy nucleotide triphosphates), i.e., terminators of A, C, G, and T (or U). While the DNA is being synthesized, this mixture with the ddNTPs permits the ddNTPs to be absorbed into the DNA synthesis, thereby preventing the DNA synthesis from proceeding any further.
As a result of this, nucleic-acid fragments are produced which have a ddNTP at their ends and whose syntheses have been stopped in various lengths (i.e., base lengths). Here, the ddNTPs are labeled in advance with fluorescent-light pigments whose colors differ depending on the respective bases. This, eventually, allows each nucleic-acid fragment to be labeled with a fluorescent-light color corresponding to the base positioned at its end. In addition, after having concentrated and refined the solution containing the nucleic-acid fragments produced in this way, the nucleic-acid fragments are denaturationized into single strands. Finally, the nucleic-acid fragments are separated on each base-length basis by using a gel electrophoresis apparatus.
Hereinafter, the explanation will be given below concerning the case where a capillary phoresis apparatus is used as one example of the above-described gel electrophoresis apparatus. At first, a capillary (i.e., glass narrow tube) is filled with a macromolecular polymer having a viscosity. Next, a voltage is applied to both ends of the capillary, thereby causing the nucleic-acid fragments having negative electric charges to be introduced and phoresied from one-side of the capillary.
Since the nucleic-acid fragments are chain-like polymerized macromolecules, the fragments move all over the polymer at speeds that are inversely proportional to the molecular weights. Namely, a shorter (i.e., smaller molecular weight) nucleic-acid fragment moves fast, while a longer (i.e., larger molecular weight) nucleic-acid fragment moves slowly. This condition makes it possible to separate the nucleic-acid fragments on each base-length basis. Moreover, the nucleic-acid fragments labeled with fluorescent-light pigments are irradiated with laser light at a position near the terminal end of the capillary (i.e., the position at which the respective nucleic-acid fragments become separable by the one base-length difference). Furthermore, a detector measures fluorescent-lights emitted from the respective bases fragments labeled with fluorescent-light pigments and positioned at the nucleic-acid fragments' ends. As described earlier, the fragments emit the fluorescent-lights in the order starting from the shortest nucleic-acid fragment. This condition makes it possible to obtain fluorescent-light intensity curves on the 4 base-type basis. Then, by comparing the 4 base-types of fluorescent-light intensities at each peak position or the like, it has become possible to perform the sequence determination of the base types (i.e., A, C, G, and T (U)).
FIG. 2 illustrates an example 2a of fluorescent-light intensity waveform data, and an example 2b of a base sequence (SEQ. ID No. 4) determined by interpreting this waveform data. In the drawing, the longitudinal axis denotes the fluorescent-light intensity, and the transverse axis denotes the phoresis time. Although, actually, the data equivalent to several hundreds of bases can be obtained at one measurement, only a part thereof has been illustrated here for explanation.
The heights of peaks appearing on the fluorescent-light intensity waveform data 2a reflect the quantity of a nucleic-acid fragment having a certain length. Usually, a longer nucleic-acid fragment is likely to exhibit its peaks at later phoresis times. Also, the peak intervals exhibit a tendency to become wider as the nucleic-acid fragment becomes longer. In view of this situation, there are some cases where a correction is made using a parameter determined by phoresis conditions such as the phoresis voltage so that the time axis of the display will be proportional to the base lengths.
In the nucleic-acid base sequence determining method according to the above-described prior art, there have frequently existed cases where there has been obtained a fluorescent-light intensity waveform from which it is difficult to determine the nucleic-acid base sequence. As the causes for these cases, the following factors can be considered: (a) the quantity of the nucleic-acid fragments is small, which makes the signal intensity weak, (b) the nucleic-acid fragments take the secondary structure by themselves, which generates an extra signal component, (c) the refinement degree of a nucleic-acid sample whose base sequence is to be determined is low, which produces a nucleic-acid fragment that generates an extra signal component, (d) the conditions at the time of a sequence reaction and the electrophoresis cause distortions to occur in the signal, or the like.
In many cases, these problems, in general, can be solved as follows: After having determined the base sequence of a nucleic acid whose base sequence is complementary to that of a nucleic-acid sample whose base sequence is to be determined, a comparison/checking is made between the mutually complementary 2 base sequences thus obtained. This method, however, requires the preparation of the 2 samples and the 2-times determinations of the respective base sequences, which necessitates 2-fold time and labor. What is more, there are some cases where, depending on the samples, it is impossible to obtain the data on the mutually complementary 2 base sequences.
When deciphering a completely unknown base sequence, the above-described problems frequently become the obstacles thereto. In the actual base sequence determination of a nucleic-acid sample, however, there are not a few cases where, just like the case of inspecting a mutation in a certain specific location of the base sequence, there has been already known at least a part of the base sequence of the nucleic-acid sample whose base sequence is to be determined. When, in this way, there exist the already-known base sequence like this to which reference can be made, the interpretation of the nucleic-acid fragment detection data is performed by making reference to the already-known base sequence by some method or other. Namely, there are prepared the already-known fluorescent-light intensity waveform and the already-known base sequence corresponding thereto (i.e., arrangements of the base-type characters: A, C, G, and T). Then, a comparison/checking is made with a newly-acquired fluorescent-light intensity waveform of the nucleic-acid fragments, thereby allowing the base sequence to be determined.
However, in the above-described comparison/reference between the newly-acquired fluorescent-light intensity waveform of the nucleic-acid fragments, the already-known fluorescent-light intensity waveform thereof and the already-known base sequence corresponding thereto, the comparison/reference needs to be made by visually checking. This has resulted not only in the problem of requiring time and labor to do the checking, but also in the problem of mutually different determinations being made by the respective judges if the comparison criterion is indefinite.
Moreover, if the conditions at the time of the sequence reaction and the electrophoresis have changed on each measurement basis causing different distortions to occur in the signal, there has existed the following problem: Based on only the one example of the fluorescent-light intensity waveforms between which the comparison/reference is to be made, making the judgement itself becomes difficult. Also, when simultaneously comparing/checking examples of a plurality of fluorescent-light intensity waveforms, the large number of waveforms are displayed in a limited space. This has resulted in the problem that the comparison itself becomes difficult or the problem that the redundancy of the comparison criterion increased.