1. Field of the Invention
The present invention relates to an information processing method of acquiring an element (such as a character or a symbol at a specific position) that characterizes a specific sequence from multiple pieces of sequence information, and preferably to a technique for processing sequence information such as a base sequence or an amino acid sequence.
2. Related Background Art
In recent years, there are needs for SNPs, polymorphism analysis, and the like to discriminate and identify different nucleic acid sequences according to a method such as hybridization or PCR. To that end, an element for discrimination and identification must be extracted first and a probe (or primer) containing the element must be selected.
In such a case, for example, when one wishes to compare an amino acid sequence and a base sequence which relate to completely different proteins, an increase in number of nucleic acid sequences or of amino acid sequences has so far posed nearly no problems. This is because individual sequences are sufficiently different from each other to be discriminated. Therefore, for those sequences, it has been possible to manually achieve probe selection by using a general alignment tool or blast tool. However, when one wishes to discriminate the same protein between hetero living organisms or tries to successfully discriminate similar ones such as the genus, species, and strain of a fungus body, and a polymorphic in an HLA region of a human being, operation with such a tool is dull and complicated.
Furthermore, in the case where multiple sequences serving as targets are extremely similar to each other, it is usually impossible to identify the sequences with only a mutation at one position. In many cases, the identification of the sequences cannot be performed until a set of mutations at several positions with different alignment positions is successfully extracted.
Moreover, the data amount of a database storing the sequences serving as targets has been increasing year by year, so the extraction of the above set of mutations according to a conventional method has been becoming more and more difficult.
In addition, as described in, for example, Japanese Patent Application Laid-Open No. 2003-038160, there is disclosed an algorithm which classifies the base sequences of a known biopolymer into a common region adopting the same sequence regardless of the kind of the biopolymer and a mutation region including a mutation, and designs auxiliary probes separately for the determined common region and the mutation region. However, the algorithm for designing the probes is intended for designing multiple auxiliary probes for capturing unknown genes (or DNA fragments), and is not intended for extracting a set of probes, in which similar sequences can be completely discriminated, through identification of as small a number of mutation positions as possible.