Enterprise Information Management (EIM) is a particular field within IT. EIM tools and techniques are for designing, cataloguing, organizing, and securing data records (including content found in databases, transaction systems, data warehouses, documents, and media) and making them available to consumers subject to security. Tools and techniques create and maintain consistent interpretation of structured and unstructured data. Tools can include extract transform and load tool. These involve extracting data from data sources. Then transforming them as needed to change format, augment, improve data quality, and the like. The data is loaded into a target data source.
While the following description will describe various embodiments related to an ERP system, one of ordinary skill in the art will recognize that the claims should not be limited to merely ERP embodiments, as the solution described herein could apply to other systems such as Customer Relationship Management (CRM) systems, Supplier Relationship Management systems (SRM), and general databases.
Enterprise resource planning (ERP) systems allow for the integration of internal and external management information across an entire organization, including financial/accounting, manufacturing, sales and service, customer relationship management, and the like. The purpose of ERP is to facilitate the flow of information between business functions inside the organization and manage connections to outside entities. Data with ERP, however, may not always be valid. For example, for an employee record, there may be a number of fields, including social security number, address, and postal code. Through profiling, it may be discovered that some of these fields incomplete, inaccurate, incorrect, or invalid data, or at least are suspected to have incomplete, inaccurate, incorrect, or invalid data. In such cases, it is beneficial to clean up this bad data and prevent future records from having such bad values entered on them. Validation and cleansing rules can be used to do this, but currently validation and cleansing rules require a lot of manual effort.
A content type is a table “column” attribute which identifies the semantic or meaning of the data values stored in a column. Content Type identification uses core Data Cleanse parsing technology and Cleansing Packages (parsing dictionaries) along with some additional custom logic (e.g. field proximity) to identify the content of a field of data. It also uses context and metadata information along with the analysis of the data itself to be able to establish an understanding of the data.