In the field of natural language processing, a similarity between character strings may be determined, which may be used as basis for many applications such as text clustering and information retrieval.
In related art, the similarity between character strings may be determined by calculating an edit distance between two character strings. Specifically, the two character strings may be respectively segmented into characters. Then one or more of a deletion operation, an insertion operation or a replacement operation of character (collectively referred to as “conversion operations” together with other editing operations) may be performed on characters in one character string so that the character string is converted into the other character string. Then a minimum number of operations required for converting the one character string into the other one is calculated and is taken as the edit distance between the two character strings. Finally the similarity between the two character strings is calculated according to the edit distance.