1. Field of the Invention
The present invention relates to a method of designing a probe from a polynucleotide group comprising a plurality of polynucleotides.
2. Description of the Related Art
A “probe” is a material that specifically binds to a target material. Two probe designing methods are generally used to design a nucleic acid probe according to whether the probe is a common probe or a specific probe.
A common probe is designed to find a common or consensus sequence among various species and family genes. The first step for designing a common probe is to find conserved genes. This is achieved by performing a keyword or homology search against a public-use database (e.g. GenBank and Medline) or performing homology analysis between one of the genes and whole database sequences. The second step is to retrieve all of the conserved genes. The third step is to perform a multiple alignment analysis using a commercially available program such as DNASIS (Hitachi Software of Brisbane, Calif.). In this step, a common polynucleotide among sequences is identified. Subsequently, a sequence of the obtained common polynucleotide is input into the program and the presence of a secondary structure is detected to select candidate probes having no secondary structure in a given Tm (melting temperature). The selected candidate probes are compared with whole sequences stored in a public-use database (e.g. Genbank) to determine the presence of a sequence causing cross hybridization, and a sequence that does not cause cross hybridization is selected as a final probe.
A specific probe is designed to find a unique sequence among various species, gene families and published sequences of a database. A specific probe hybridizes with one specific gene. The first step for selecting the probe is to find related genes, for example, by performing a keyword search on a public-use database (e.g. GenBank and Medline) and to obtain all information on the related genes. The second step is to find a common region and a unique region by performing a homology search on the obtained genes and the sequences published in a database (e.g. GenBank). Then, the obtained candidate probes are input into the program (such as DNASIS) and the presence of a secondary structure is determined to select probes having no secondary structure in a giver Tm (melting temperature). Finally, the obtained candidate probes are compared with sequences published in a database (e.g. GenBank) to determine the presence of an identical sequence, and a probe having no sequence identical to the published sequence is selected.
However, the above-described conventional methods involve a multiple alignment analysis. In the multiple alignment analysis, a plurality of target polynucleotides are aligned such that nucleotides correspond to each other under a specific condition and a probe is selected by comparing a completely matched region and with a mismatched region among the polynucleotides. However, since polynucleotides are aligned and their sequences are compared, the analysis takes a long time, and alignment accuracy of sequences may vary depending on alignment conditions. For example, the alignment accuracy may vary depending on a gap condition determining an allowed interval between nucleotides. Further, when a probe should be repeatedly designed, alignment of polynucleotides should be repeated.
In addition to these conventional methods, oligoprobe designation has been used to design a common probe or a specific probe (U.S. Pat. No. 5,556,749; Hitachi). Although this method involves the comparison of two sequences, it can be used to design a common probe or a specific probe if it is repeatedly applied to a plurality of sequences.
The Hitachi method involves the rapid comparison of two sequences A and B to identify whether the sequence A is identical to the sequence B but at least some number of base pairs. For example, whether a sequence identical to a sequence A having a total length of 20 bp or a sequence identical to all but 1 or 2 bp of the sequence A is present in a sequence B is rapidly found. In this method, a subsequence of the sequence A, which is called a “tuple”, is produced and is compared with the sequence B. If a sequence identical to the subsequence is present in the sequence B, the bp of the subsequence are increased one by one until the length of the subsequence is the same as the total length of the sequence A. If a mismatch between A and B is greater than an allowed value (user's set value), it is concluded that the sequence A is not present in the sequence B on the basis of the allowed mismatch of bp. If the sequence A to be compared has 18 bp and the number of allowed mismatched bp is 2, a sequence completely identical to a subsequence of the sequence A having at least 6 bp should be present in the sequence B. Such a sequence is called a “k_tuple”. The k_tuple is compared with the sequence B and if the type thereof is completely identical to a portion of the sequence B, the bp of the k_tuple are increased one by one.
Even though the above method involves the comparison of two sequences, it can be used to find a common sequence or a specific sequence by repeatedly performing such a comparison on a group of sequences having a certain length.
Nevertheless, there is still a demand for a method of rapidly and accurately designing a common probe or a specific probe from a plurality of polynucleotides.