While the use of data management systems has increased significantly over the past decade, one long standing problem, and barrier to entry, for providers of data management systems is how to provide potential users of the data management systems the functionality and features of the data management systems, without requiring significant user data entry, and/or other significant user interaction, with the data management systems.
Current data management systems include, but are not limited to, any of the following: a computing system implemented, or Internet-based, personal and/or business financial transaction management system; a computing system implemented, or Internet-based, personal and/or business financial management system; a computing system implemented, or Internet-based, personal and/or business asset management system; a computing system implemented, or Internet-based, personal and/or business accounting system; a computing system implemented, or Internet-based, point of sale system; a computing system implemented, or Internet-based, personal and/or business tax preparation system; a computing system implemented, or Internet-based, healthcare management system; and/or any of the numerous computing system implemented, or Internet-based, financial management systems known to those of skill in the art.
Efforts to minimize user data entry associated with the data management systems is often complicated by the problem of data extraction from various user documents. Data extraction from documents, both structured and unstructured, has inherent and long standing problems and complications that make potential users of many data systems hesitant to use data management systems. One current method of data extraction is to generate various data extraction templates used to identify data fields within documents.
A data extraction template contains location and contextual details of where data fields of importance, i.e., data fields containing desired data are present in the document. The data extraction template is then used as a map to obtain the desired data, i.e., extract the desired data. Since each type of source document includes desired data in different locations within the source document, a specific data extraction template typically must be generated and used with each specific type of source document.
However, it is not practical for a provider of a data management system to create data extraction templates for every type and format of document the data management system may encounter. Consequently, in many cases, the provider of a data management system may encourage users of the data management system to contribute/take part in the creation of data extraction templates for unknown document formats, such as Tax documents, that often have a long tail of unstructured formats.
While this user contribution approach can be effective, the user contribution is entirely voluntary and a single user may not contribute everything that is necessary to create a full data extraction template which can extract all required fields/desired data in a given document. In addition, it may also be the case that not all fields are present in the document that the user is using as a reference for data extraction template creation. For example one invoice from a given vendor for which a user is creating a data extraction template may not have a “terms” field while another invoice, from the same vendor, may have a “terms” field.
As data extraction templates are created for the same vendor, and/or source document type, it becomes imperative to manage these data extraction templates, identify the most relevant data extraction templates, and discard redundant, and/or outdated, data extraction template data. However, currently, there is no efficient, effective, and user friendly means or mechanism for doing this.