Optical Character Recognition (OCR) systems are typically used for the digitization of paper assets—i.e. the “scanning” or “on-ramp” of physical items (usually documents). However, they can also be used to “re-digitize” documents extant in electronic form. Many documents are currently stored in rather spartan representations; for example, as completely flat PDFs, images or even videos. These documents can benefit from being fed as “input” to the OCR engine, and thus be enriched with the types of data extraction and metadata creation that occur for scanned paper documents.