Large scale data mining, which is sometimes referred to as ‘Big Data’ typically calls for real time maintenance of massive, enterprise level databases and use of numerous data analysis programs to extract currently meaningful information from the databases. The enterprise level databases typically store large numbers of relational tables that provide basic relational attributes for system tracked data objects (e.g., customers, products, employees, sales transactions, etc.). Data mining often calls for identification of complex correlations between system tracked data objects (e.g., which employees satisfactorily serviced which customers in a select class of sales transactions?).
These kinds of analyses typically call for selective joining of data from multiple database tables. Emerging challenges in this area include quickening the rate at which Big Data mining results are produced despite the growing sizes of the massive databases and making efficient use of finite data processing resources. One method of achieving these goals is to rely on pre-computing wherein certain computational operations that are likely to be required when the data analysis programs execute are carried out before program execution so that the results are immediately available for use by currently executing programs. One form of pre-computing is known as a pre-join operation. Here, tables that are to be selectively joined together inside an analysis program are joined together ahead of time.
Traditional database pre-join techniques exhibit poor performance when the number of tables increases significantly. An improved method and system are disclosed here.