It is known that DNA (deoxyribonucleic acid) is a chain of nucleotides consisting of 4 types of nucleotides; A (adenine), T (thymine), C (cytosine), and G (guanine), and that RNA (ribonucleic acid) is comprised of 4 types of nucleotides; A, U (uracil), G, and C. It is also known that all of these 5 types of nucleotides specifically bind to one another in combinations called complementary base pairing. This is to say, adenine (A) pairs with thymine (T) (in the case of RNA, however, adenine (A) pairs with uracil (U)), and cytosine (C) pairs with guanine (G), so that each of these base pairs forms a double strand. This double strand is referred to as “hybridization”, and the generation of a double strand is referred to as “to hybridize”. Moreover, a nucleotide sequence that can be bound to a given nucleotide sequence to form a double strand is referred to as a “complementary sequence”.
In some cases, it is required to use the above described complementarity of nucleotides to determine whether or not DNA or RNA of interest is present in a sample that is likely to contain predetermined types of DNA or RNA (hereinafter, a nucleotide sequence that is a target of testing is referred to as a target nucleotide sequence in the present invention). Moreover, there may also be a need to prepare a chain of nucleotides that is referred to as a probe in the present invention and that specifically binds to nucleotides in a given region of DNA or RNA, so as to determine whether or not the chain of nucleotides binds to, that is, hybridizes with, the target sequence. The above determination is referred to as binding determination or hybridization determination. Furthermore, a complementary sequence used to detect a target nucleotide sequence is referred to as a probe.
The above binding determination is used for various purposes. For example, a DNA chip is an information processing chip, which uses the above described character of hybridization. In many cases, the information processing chip prepares sequences complementary to the nucleotide sequences of various types of DNA or RNA and simultaneously carries out a large volume of hybridization determination, thereby executing a process of interest. Moreover, what is called PCR (Polymerase Chain Reaction) is a method for determination and evaluation of a DNA sequence by generating sequences complementary to the sequences of two portions on the DNA sequence and copying in large quantities a region surrounded by these complementary sequences.
In many cases of determination and evaluation using these complementary sequences, sequences of DNA or RNA that differ from those of interest may be mixed in an actual target sample. In such a case, assuming that a probe to be prepared or provided does not bind to the mixed nucleotide sequences can provide the efficiency, high precision and high reliability of the determination and evaluation. In some cases, a DNA or RNA synthesizer may be used to prepare a probe P that is specific for a given DNA. Thus, it is considered that the efficiency of protein synthesis, screening and the like is significantly increased by efficiently eliminating probes P other than that of interest.
FIG. 18 is a view showing the relationship between a target nucleotide sequence and a probe. In the figure, the target nucleotide sequence is represented by a symbol T, and the probe is represented by a symbol P. The target nucleotide sequence T can be, for example, a long chain nucleotide sequence in which several thousands of nucleotides are bound (hereinafter, the number of nucleotides is referred as base pair (bp) in the present invention). Substantially, it is ideal that a probe 102 shown in FIG. 18 can be a sequence completely complementary to the sequence of a region represented by Tp in the target nucleotide sequence T.
However, in reality, a predetermined nucleotide sequence binds to another nucleotide sequence that is not completely complementary (the sequences bind to each other with an identity of 80% to 90%). Moreover, nucleotide sequence determination devices such as a sequencer may cause analytical errors. So, it is not appropriate to eliminate nucleotide sequences for the reason that they are not 100% identical and so they cannot be probe candidates. To confirm that the probe P does not bind to the target nucleotide sequence, it has conventionally been required to analyze the nucleotide sequences of both parties by a high precision alignment algorithm such as in the Smith-Waterman method, and to assure that, in the target nucleotide sequence, there are no sequences that are similar to the complementary sequence of the probe.
High-speed searching algorithms such as BLAST (Altchul S F., Miller, G W., Myers E W., Lipman D J., “Basic local alignment search tool”, J. Mol. Biol. 1990, Oct. 5, 215 (3), 403-410) or FASTA (Pearson, W R., Lipman, D J., “Improved tools for biological sequence comparison”, Proc. Natl. Acad. Sci. USA, 1988, April; 85(8), 2444-2448 Related Articles, Links) have been proposed as means of searching for nucleotide sequences that are similar to each other. However, even using these algorithms, it is not possible to discover all similar partial sequences in the above described target nucleotide sequence, and therefore they are not suitable for screening to assure that the probe does not bind to the target sequence.