Form documents are usually designed to collect input information. For instance, a medical form may be used to collect patient information, and a test sheet may be used to collect a student's response to a set of test questions. Traditionally, such forms are reviewed and stored in various hardcopy formats. The current trend in handling these forms is to digitize them for review, storage and distribution. Such digitization typically occurs using an optical scanner, as is well known in the art.
Some specialized applications use scanned form documents to perform paper interactive tasks and therefore employ Optical Mark Recognition (OMR), Optical Character Recognition (OCR) or other content extraction techniques. However, scanners may distort and degenerate the image of a form document. Scanned image degeneration or distortion increases the failure rate of OMR and OCR more often than desired. Even where advanced image processing algorithms can be used to automate OMR, increasing the reliability of such processes presents a significant challenging to efficient and effective use of content extraction.
One approach to execute OMR and other content extraction techniques employs the digitization of a blank form document that is used to compare to the filled-in form document. Thus, the difference between the two forms may yield the marks made on the filled-in form. Unfortunately, there are two significant disadvantages of this approach. One is that an extra step is needed to digitize and store a blank form document. The other is that the digital blank form document will inevitably bear characteristics of the digitizer or scanner used to generate it. In particular, scanner image quality degradation in the form of blurring and distortion will provide a blank form document that may be different from a blank form document scanned by another scanner. Such differences will contribute to errors during the recognition stage.
Therefore, what is needed are systems and methods for facilitating content extraction such as OMR and OCR from a document, and for maximizing the accuracy of such content extraction techniques by reducing quality degeneration of the document caused by printers, digitizers, scanners, and the like.