Data quality monitoring tools allow a user to define data rules describing conditions that data should fulfill to meet predefined quality standards or describing how data is transformed from a source to a target. A data rule is typically an expression describing a condition or a transformation that can involve one or more column or one or more variables. Data rules can be complex and usually use a language, such as SQL or a proprietary language to define the logic of the data rule. For example, a typical data rule expressions could be: “EMPLOYEE.AGE>0”, specifying that the column AGE of the table EMPLOYEE must have positive values, or “EMPLOYEE.ID UNIQUE” specifying that each value in the column ID must be unique, or even more complex conditions, such as “IF CUSTOMERS.AGE<18 THEN CUSTOMERS.MARITAL_STATUS=‘child’” specifying that depending on the value of a specific column, the value of another column is constrained.
As may be seen, data rules are usually bound to the physical names of the sources upon which they operate. Thus, in existing solutions, the definition of the rule contains the name of the tables or columns upon which the rule is applied. In the example above, the data rule “EMPLOYEE.AGE>0” implies that the column being tested is a column named AGE contained in a table EMPLOYEE. In conventional solutions, data rules are not generally defined to be portable. That is, if a user decides to apply a given data rule on a different column, then the data rule needs to be redefined. Therefore, if a user wants to apply the original data rule, “EMPLOYEE.AGE>0,” to a new column of data such as “PRODUCTS.WEIGHT,” the user must redefine the original data rule to include the new location of the data. A new data rule—“PRODUCTS.WEIGHT>0”—must be created. Additionally, as may be appreciated, the new data rule is not related to the original data rule.
In like manner, if a user decides to apply the original data rule against a different value then the data rule needs to be redefined. Therefore, if a user wants to apply the original data rule, “EMPLOYEE.AGE>0,” to a new value such as “65,” the user must redefine the original data rule to include the new value. A new data rule—“EMPLOYEE.AGE>65,” “—must be created. Additionally, as above, the new data rule is not related to the original data rule.
Therefore, systems and methods for portable data management are presented herein.