Optical character recognition systems provide a transformation of pixelized images of documents into ASCII coded text which facilitates searching, substitution, reformatting of documents etc. in a computer system. One aspect of OCR functionality is to convert handwritten and typewriter typed documents, books, medical journals, etc. into for example Internet or Intranet searchable documents. Generally, the quality of information retrieval and document searching is considerably enhanced if all documents are electronically retrievable and searchable. For example, a company Intranet system can link together all old and new documents of an enterprise through extensive use of OCR functionality implemented as a part of the Intranet (or as part of the Internet if the documents are of public interest).
However, the quality of the OCR functionality is limited due to the fact that the complexity of an OCR system is huge. It is difficult to provide an OCR functionality that can solve any problem encountered when trying to convert images of text into computer coded text. One example of a problem that often occurs is that the OCR system may not distinguish correctly between characters because their images in the text seem to be equal. For example the character ‘c’ can easily be interpreted as a ‘e’, or vice versa, if the distinguishing details are blurred, which may be due to dirt or aging etc. of a page comprising the characters. Such problems are usually identified by an OCR program since the OCR system can establish for example a probability (or score value) for the certainty of the recognition of a specific character. For example, when two or more characters has a probability of being substantially equal possible candidates as an identification of an image of a character, these alternative candidate characters are reported, for example in a list being part of the OCR output data, together with a corresponding list of words comprising the uncertainly recognized characters identified by the OCR system. Sometimes several characters can be uncertainly recognized in the same word, which amplifies the problem of identifying which candidate characters are the correct ones, and thereby the words themselves.