Optical character recognition (OCR) is a powerful technique for transforming an image to text. OCR algorithms have been developed to be robust and accurate. However, OCR is most accurate when used on a page of regularly spaced, well-aligned text. An image including multiple typefaces, arbitrary text placement within the page, multiple text sizes, etc., is much more difficult for OCR algorithms to recognize accurately. These difficulties create large problems for successfully performing OCR algorithms on some text documents.