The present disclosure relates generally to computer systems, artificial intelligence and intelligence analysis, and more particularly to correcting errors using fact repositories.
Computers are used to transcribe speech and handwriting. They are also used to convert scanned images of text into text. Examples of such processing include optical character recognition (OCR) that converts paper documents into digital form by scanning, speech recognition that converts voice into text, and handwriting recognitions. Inevitably, errors occur in computerized text and voice and other such processing. Errors also originate from other sources, e.g., mistyped data and other mistakes made by people entering the data.
Existing systems currently correct errors based on a “language model”, i.e. an encoding of statistical information about co-occurrence of words or word patterns. For instance, existing solutions correct some spelling errors or grammatical errors. However, they do not contemplate correcting other context and/or semantic type of errors. Similarly, many repositories of data such as relational and extended markup language (XML) databases, textual and video archives have errors, either in their content or in associated metadata. Other than for simple cases such as a mismatch between a zip code and a town name, current automated error correcting computer systems or software do not handle correcting such errors.