1. Field
The present invention relates to a method, system, and article of manufacture for using a data mining algorithm to discover data rules.
2. Description of the Related Art
Data records in a database may be processed by a rule evaluation engine applying data rules to determine data records that have column or field values that deviate from the values that are expected by the rules. In the current art, the user manually codes data rules by first analyzing the data visually or using a profiling tool to obtain an understanding of the pattern of a well-formed record. Next a user builds logical expressions that define a set of rules to describe the normal characteristics of records in the set. These rules are then repeatedly executed against data sets to flag records that fail the conditions specified by the data rules and report on trends in failure rates over time.
A user may use a rule editor user interface to create new data rules or modify existing rules. Rules may be expressed in a rule language, such as BASIC or the Structured Query Language (SQL). The user may then save rules in a rule repository in the rule language or in a common rule format. The user may then select rules from the rule repository and a data set of records to provide to the rule evaluation engine to execute the selected rules against the selected data records to validate the data, capture the results and display the results to the user.
Developing data rules can require a significant amount of user time, effort and skill to analyze patterns in data, especially for large data sets having millions of records with hundreds of columns. For this reason, a data user typically does not develop and deploy rules until after bad data records result in recognizable business problems or setbacks. Because of this, data rules are often defined reactively after a problem is experienced and may be tailored to address the last experienced problem instead of future problems that may arise with the data records.
There is a need in the art to provide improved techniques for generating and using data rules.