Many different types of forms are used in businesses and governmental entities, including educational institutions. Forms include transcripts, invoices, business forms, and other types of forms. Forms generally are classified by their content, including structured forms, semi-structured forms, and non-structured forms. For each classification, forms can be further divided into groups, including frame-based forms, white space-based forms, and forms having a mix of frames and white space. The forms include characters, such as alphabetic characters, numbers, symbols, punctuation marks, words, graphic characters or graphics, and/or other characters. Text is one example of one or more characters.
Automated processes attempt to identify the type of form and/or to identify the form's content. For example, one conventional process performs an optical character recognition (OCR) on an entire page of a document and attempts to identify text on the page. However, this process, when used alone, is time consuming and processor intensive. In another conventional approach, image registration compares the actual images from two forms. In this approach, the process starts with a blank document and compares it to a document having text to identify the differences between the two documents. Image registration requires a significant amount of storage and processing power since the images typically are stored in large files.
These approaches are ineffective when used alone, are time consuming, and require a large amount of processing power. Moreover, some of the processes require knowing the location of data prior to processing documents. Therefore, improved systems and methods are needed to automatically process documents.