1. Field of the Invention
The present invention relates to a pattern matching method and apparatus for performing Dynamic Programming (DP) matching of symbol sequences and to a speech information retrieval system.
2. Description of the Related Art
DP matching is well known as pattern matching used in the fields of information retrieval and speech recognition (for example, see Japanese Patent Application Laid-Open No. 11-282488). This is an approach for calculating the level of similarity between two symbol sequences using, as a measure of penalty, incorrectness (under operations of insertion, deletion, substitution, etc.) other than a coincidence or correct answer as a penalty.
The DP matching method mentioned above is used for calculating similarity between two symbol sequences to be compared. Specifically, if the two symbol sequences have different respective lengths and one of these symbol sequences contains another symbol sequence, the similarity is calculated to be low because it is calculated as having an insertion error. For example, if “aaabbb” and “aaa” are matched to each other, a penalty may be added taking into account the fact that there is an insertion of “bbb.” Thus, matching using the DP method is not suitable in cases requiring a determination of whether or not “aaa” is contained in “aaabbb.”
In order to perform matching between these symbol subsequences, a method in which no insertion error is simply taken into account is a possible approach. In this case, the similarity may however be the same both when there is a string of matching symbol sequences as a symbol subsequence, and when the symbol sequences appear apart from each other. For example, the same score (similarity) is obtained both in considering whether “ab” is contained in “acccb” and in determining whether “ab” is contained in “abccc.”
To deal with this problem, in matching between symbol subsequences using the DP matching, a method for repetitively calculating while shifting the matching range is known. This method has a problem in that it requires a significant amount of calculation.