Document receiving organizations often receive vast quantities of printed forms, such as insurance forms, financial forms, magazine subscription forms, and change of address forms, containing user provided or submitted information. These documents and forms are typically received in high volume, arranged in a random and unsorted manner and in a wide variety of conditions. Each of these received physical documents and forms are, upon receipt, scanned to generate an electronic document image for further processing and organization. The scanning process often captures image artifacts representing coffee stains, ink smudges, and/or typed and handwritten information provided by a user. Moreover, the scanning process, such as facsimile transmission, often distorts the electronic document image by introducing distortion in the form of image skew, rotation and translation. The variations make known comparison techniques derived from pixel and location checking difficult and further complicates the task of processing and organizing the electronic document and form images.
Moreover, because these documents and forms are received in a random order and include an unknown number of document form types, known clustering routines are inapplicable.