Paper document processing has traditionally been a time consuming task. It has been a daily practice in businesses to handle paper documents manually, incurring several man-hours in the repetitive process of classifying forms and finding desired information thereon. Examples include processing tax returns for income calculation in the lending industries, processing sales invoices for bookkeeping, and comparing quotations from different suppliers. Since the advent of scanning and optical character recognition (OCR) technologies, a number of manual, paper-based tasks have been automated, e.g., for archiving purposes. Challenges remain, however, in the fields of scanned document image recognition and parsing.