OCR
Optical Character Recognition (OCR) technology is widely used for regenerating text data from printed texts. OCR is essentially a software technology that handles the conversion of image data to binary text data.
The reliability of OCR is not perfect, but improvements are still being made. OCR, despite its imperfections, is very usable as a data acquisition tool in electronic text editing. It does not follow, however, that OCR is a technology that is well suited to be a communication vehicle between machines. There are three arguments against it: First, the graphic appearance of alphabetic letters (or numbers) responds to the requirements of human readers. Reading them with a machine commits far more processing resources to the task of character recognition than would have to be mobilized for the recognition of graphic symbols purposely designed for machine readability. The extra effort can only be justified in terms of adding value to primarily human-readable text systems. Second, the data density per area of printed matter in the case of human-readable text falls far short of the limits of the involved technologies. Third, each binary code (0 to 255) cannot be represented by a character. Only about one third of the 256 binary codes are unequivocally assigned to characters. Others are assigned to characters in a non-standard way and some codes have no character assignments. Spelling out numbers, each one with two hexadecimal digits, would solve the problem but at the expense of further deteriorating the achievable data density.