OCR is a technology that enables conversion of images (e.g. scanned or photographed images) of printed text into machine-editable and searchable text. Binarization is an important process which prepares an image for OCR. Better binarization quality results in higher quality of OCR. Each pixel of a binarized image may have only one of two values—black or white.
Pixels of grayscale images are additionally characterized by a number representing brightness, from 0 (black) to 255 (white). In order to binarize a grayscale image, a brightness threshold must be established for each pixel; a pixel with a brightness level above this threshold is considered white and a pixel with a brightness level below this threshold is considered black. A potential difficulty of image binarization lies in establishing the thresholds so that the image is not distorted and no valuable information (i.e. the text to be recognized) is lost.