1. Field of the Invention
The instant invention relates to a method and an apparatus for determining nucleotide sequence (base sequence), and a computer program product to be executed by the apparatus for determining the nucleotide sequence.
2. Description of the Related Art
The human genome is composed of approximately three billion genetic codes (bases). The “Human Genome Project” currently underway is set to solve the entire genetic code (nucleotide sequence). In this course of events, the fact that many differences exist in the genetic codes (nucleotide sequence) of individual human beings is becoming clear. Differences in human genome nucleotide sequences (polymorphism) are classified into single nucleotide polymorphism (SNP) where one base is substituted with another base, variable number of tandem repeats (VNTR or microsatellite polymorphism) due to an absence or intercalation of between one and several thousand bases, and the like, though currently, single nucleotide polymorphism (SNP) is particularly drawing attention among such types of polymorphism. Single nucleotide polymorphism (SNP) is the difference in one base out of the DNA nucleotide sequence, and is the smallest unit of a human characteristic trait including the ability to handle alcohol and whether drugs have a strong effect. Among the three billion base pairs in the humane genome, it is suggested that approximately three million (a ratio of 1 per 500 to 1000 base pairs) to ten million single nucleotide polymorphism bases exist, which bring about differences in people (physical traits) such as the inability to make particular proteins or the production of proteins difference from other people, racial differences and the like. With respect to research into genetic individual differences in human beings, it is said the analysis of single nucleotide polymorphism and investigation of the susceptibility to diseases and the response to medicines will make made-to-order medical treatment possible where medicine suited to the patient and with few side effects to the patient is administered, and research into single nucleotide polymorphism (SNP) analysis is progressing. For plants, it is possible to identify the mechanism of resistance to disease and pests that the plant has conventionally and enhance those functions.
A reason that can be given why attention is focused on single nucleotide polymorphism (SNP) is the increase in interest in the relationship between disease and SNP because analysis of a variety of SNPs is possible through improvements in analysis techniques. The object of that research spans a wide range including disease-related genes, analysis of the individual differences in drug metabolism, and chronic diseases. The relationship with SNP has been explained for some cases of drug metabolization and lipid metabolism. Future clarifications are expected to gradually develop regarding these issues and SNP.
Molecular biological engineering such as SNP analysis includes a vast number of manipulations on an extremely large number of samples. Those manipulations are frequently complex and time-consuming, and they generally require a high level of precision. For many techniques, the absence of sensitivity, specificity, or reproducibility limits their application.
For example, problems that accompany sensitivity and specificity have thus far limited practical applications of nucleic acid hybridization. “Hybridization” refers to the formation of nucleic acids and the formation of nucleic acid hybrid molecules, and is used as a method for studying the primary structure of nucleic acids, that is the homology of nucleotide sequences, and for detecting nucleic acids having homologous nucleotide sequences. Hydrogen bonds can be formed between base pairs having complementarity whose nucleic acids are in a strand, that is, between adenine (A) and thymine (T) as well as between guanine (G) and cytosine (C), and the characteristic of nucleic acids to form two double helix strands is used. In general, nucleic acid hybridization analysis includes the detection of an extremely small number of specific target nucleic acids (DNA or RNA) from a large volume of non-target nucleic acids using a probe. To maintain a high specificity, hybridization under the strictest of conditions is carried out, ordinarily achieved by variously combining temperature, salts, detergents, solvents, chaotropic agents, and denaturants. The majority of samples, and particularly DNA in human genome DNA samples is associated with extreme complexity. When a sample is made from an extensive number of sequences closely resembling a specific target sequence, a large number of partial hybridizations occur with the non-target sequences even with the most unique of probes. There are also cases where undesirable hybridization kinetics are involved between probe DNA and its specific target (sample DNA). Even under the most favorable of conditions, the majority of hybridization reactions are carried out with relatively low concentrations of probe DNA and target molecules (sample DNA). In addition, probe DNA often competes with complementary sequence for sample DNA. There is also the problem that high-level non-specific background signals are generated because probe DNA has an affinity for almost any substance. Either individually or in combination, these problems thus cause a loss of sensitivity and specificity in nucleic acid hybridization.
Based on such circumstances, the present inventors have already proposed methods (refer to published unexamined patent application 2004-125777) for carrying out significant difference determinations, for example, using a t-test on the size of signals in order to make a determination (of homo-type or hetero-type of bases) of the SNP (single nucleotide polymorphism). In the method described in published unexamined patent application 2004-125777, a hetero-type determination is not made unless the two types of signal values match nearly completely, but in actual measurements, that is impossible.
In this manner, genotyping algorithms for determining the nucleotide sequence of nucleic acids exist in earlier technology, but there are problems with the accuracy of determination.