1. Field of the Invention
The present invention relates to an information processing apparatus, an information processing method, and a program.
2. Description of the Related Art
In recent years, various kinds of information have been digitalized and stored, and the amount of the digitalized information keeps on increasing. In these circumstances, a method for efficiently retrieving necessary information from sources of stored information is desired.
Various kinds of methods for encoding a source of raw information have been invented recently. Accordingly, approximate string matching is applied to searches and analyses of not only documents but also information in a wide range of fields including encoded multimedia such as voice, music, images, and videos. In searches and analyses of encoded information, the string distance metric used in the approximate string matching greatly affects the efficiency of search processes and the adequateness of obtained search results.
Examples of widely used string distance metrics include Hamming distance applicable to two strings having the same length and Levenshtein distance (which may also be referred to as “edit distance”) applicable to strings having different lengths.
At this occasion, another metric reflecting the dispersion and positions of unmatched portions has been introduced to the approximate string matching in order to improve the accuracy of a distance and efficiently separate strings. Other examples of such metrics include entropy metric and N-gram method (WO 2009/085555).