Field of the Disclosure
This application relates generally to improvements in a system for automatically identifying a document type. More particularly, this application relates to improvements related to comparing and identifying a digital form and/or a digital document.
Description of the Related Art
Today, documents and forms (e.g., W2 or other tax forms and documents, patent filing and examination related forms and documents, immigration related forms and documents, etc.) are acquired in digital form and distributed or processed for various purposes. Such digital documents can be emailed and stored in a database. The digital documents can contain user inputs such as personal information, a signature, a photo, etc. Often times, the user inputs may be extracted for verification purposes, data processing, automatic data filling, etc. However, before extracting the user inputs, the type of document or form under consideration must first be identified.
To enable form identification and automatic data extraction from the digital documents, a computer or circuitry is configured to perform a form recognition process. However, form recognition is not a trivial process. There can be more than a million different types of forms and documents, each having a different format, structure, or layout and containing different user inputs. As such, comparing and identifying a digital document with millions of standard or known documents (also referred to as master documents) can be time consuming for a computer, processor, etc., let alone doing it manually.
A typical form recognition process includes searching each and every form stored in a database and comparing the stored forms with the digital document under consideration. Typically, one or more features such as an image, a form structure, etc. are extracted and compared with the stored forms. However, such a form recognition process is slow, may return a large number of matches (e.g., more than 100), and may not be accurate. As such, a user may have to manually browse through a large number of forms to identify the type of form being evaluated. Thus, a form recognition process having high accuracy and quick searching capability is required to save time, manual effort, and cost.