Due to the wide use of computer systems, using a computer to process all kinds of documentation has become a standard operation in contemporary business activities. In the processing of the computerized documentation, the correctness of the content is always required. As a result, to ensure the correctness has become in important task in the field of the computerized document processing.
In the processing of documentation files containing Chinese characters, such as Chinese text files (in simplified Chinese characters or in traditional Chinese characters) and Japanese text files, "errors" are always found in the files, no matter whether the files are input from a keyboard, a phonetic recognizer or an OCR (optical character recognizer) or retrieved from a text file.
Here, the term "error" generally pertains to two categories: typographic errors and wrong selections. In this invention, "typographic error" generally means omissions, deformations and dispositions in character strokes during the recognition or handwriting of the characters, and omissions, additions, duplications and mistakes in key striking, both happening during input of blocked characters such as Chinese characters. "Wrong selection" means selecting a wrong character instead of a correct character during the input. In addition, during the conversion of the simplified Chinese characters, as used in Mainland China, and the traditional Chinese characters, as used in Taiwan and Hong Kong, wrong selections are noted. These errors are called "errors" collectively in the following description of this invention.
In the past, errors contained in the text files are detected and corrected by human beings by way of reviewing whole content of the files. Since the reviewing is time consuming, the prior art provided some methods and devices to detect and correct the errors by a computer automatically or semi-automatically, so that the number of the errors in a text file may be reduced.
Taiwan patent number 59572 described to a "Automatic Wrong Character Detection Method for Chinese Language and its Detection Device". By using the method disclosed by this prior art, the errors contained in a Chinese text file may be detected automatically so that users may correct the errors based on the results of detection. The error detection method of this patent included: preliminary segmentation of the sentence in process, based on certain statistical data; selection of low-frequency single-character terms which are not frequently used; and determining the low-frequency terms to be errors. This patent disclosed an error detection method which can detect almost all the errors contained in a sentence. It, however, can not suggest how to correct the sentence and most of the "errors" it detected were not real "errors". Furthermore, a "table of combinations of character streams" used in this patent contained a huge number of data which resulted in low processing speed.
Taiwan patent application number 83103817 described to a "Method and Device for the Automatic Correction of Errors in Chinese Text Files". This patent disclosed a method to correct errors contained in a Chinese text file where all the characters of a sentence are converted into a series of similar-character clusters and the sentence is segmented according to the result of the conversion. The combinations (linkages) of the character streams in the sentence according to a "table of combinations for character streams" are assessed and given scores. Errors are detected based on the scores so obtained and corrections are suggested. Although this patent provided a useful method for the correction of the errors, the collection of the similar character and the table of linkage of character stream contained, again, a huge number of data As a result, the segmentation and the assessment of the linkage score can not be processed at higher speed.
It is thus an urgent need in the field of document processing to have a method and a device for error detection and correction for computerized text files that can detect almost all the errors contained in a text file, can separate the real errors and can be processed automatically. There is also a need to provide a method and a device for error detection and correction for computerized blocked character text files that can detect errors caused by a variety of reasons.