The present invention relates generally to the field of information warehouse systems and, more specifically, to data filtering in ETL (Extract, Transform, Load) processes.
Enterprises are building increasingly large information warehouses to enable advanced information analytics and to improve “business values” of information. The data in the warehouses are loaded via ETL (Extract, Transform, Load) processes. Today's information warehouses typically deal with complex data and ETL processes. With the complexity of both data and analytics, users often need to filter out a subset of data in the ETL processes, i.e., only a subset of the data is selected to be loaded according to users' interests. The complexity of the data and the ETL process bring new challenges to the data filtering task, for which it is often necessary to support data filtering in the ETL process.
It is critical to build a general data filtering framework which can be applied in various phases of the ETL process and support various data types and filtering semantics. When there are multiple filters in an ETL process, it can also be important to automatically find an optimal (e.g., a more time-efficient) way to execute the filters.