There are often error character strings, such as wrongly written or mispronounced characters and mis-spelled words, in the text used in daily work and life. How to recognize and correct the error character strings in the text by a computer is a technical problem to be solved in the current technical field of information processing.
At present, there exist text correction programs based on language rules.
Specifically, in the programs, the language rules such as word collocation rules and word spelling rules of target language (i.e. the language adopted by target document) are summarized preliminarily. For example, when the target language is Chinese, the word collocation rules of Chinese will be summarized preliminarily, then according to the preliminarily summarized language rules to evaluate the text to be processed and judge whether the text to be processed conforms to the preliminarily summarized language rules. When the evaluating result shows that the conformity of text to be processed with the preliminarily summarized language rules does not meet the predetermined requirements, the program conducts error correction processing for the text to be processed according to the preliminarily summarized language rules.
It can be seen that the conventional text error correction program based on language rules not only needs a lot of working personnel with abundant language background to summarize a mass of language rules. But due to the complex structure of language itself, it is not easy to summarize language rules, and there are often conflicts between different summarized language rules. Therefore, the error recall rate of text error correction program based on language rules is low and the accuracy of error correction is also low.