The present invention relates generally to the field of data mining, and more particularly, to the field of rule determination for data mining.
Data mining is the process of extracting patterns from large data sets. Data mining allows extracting knowledge from enormous amounts of data which are not suitable for human interpretation or evaluation due to their structure and/or amount. A common problem in data mining is that non-hypothesis driven approaches tend to be slow and can therefore cannot be used interactively. Hypothesis driven approaches based, for example, on the use of online analytical process (OLAP) cubes typically require less computing power but are confined to the existence and use of a hypothesis. Often, however, such a hypothesis is not known and it is an aim of data mining to automatically determine plausible hypotheses and to further execute a drill down analysis based on a particular hypothesis. Due to the enormous processing power required by non-hypothesis driven approaches, there is a need for real-time and interactive non-hypothesis driven data mining approaches that can process huge amounts of data.
U.S. Patent Publication No. 2010/0235335 discloses a method for providing a column store database system that supports high throughput read performance. U.S. Patent Publication No. 2005/0278286 discloses a method for providing a data mining interface during the construction of a query for filtering database columns and for displaying the filtered information to the user. Prior art columnar database systems are not able to provide for an interactive real-time identification of new candidate rules. Known columnar databases are, for example, Vertica™, ParAccel™, Infobright™, Sybase IQ™, and others.