Optical character recognition (OCR) is a technique to convert images of text (e.g., a scanned document, a photo of a document, an image that includes text, etc.) into machine-encoded text (e.g., text data). OCR has been widely used in digitizing printed data for data entry, editing, searching, and storing data electronically. OCR can have different levels of accuracy depending on the particular language. For example, for English text, OCR is relatively accurate. For other languages, such as languages using the Arabic alphabet, and written in Arabic script, OCR can be highly inaccurate. In Arabic script, for example, individual letters change shape relative to their position within a word.