The present invention relates generally to the field of database management, and more particularly to updating database statistics using dynamic profiles during the extract, transform, and load (ETL) process.
ETL (extract, transform, and load) systems facilitate extracting data from various sources, transforming the extracted data to fit operational requirements, and loading the transformed data into a data repository, such as a database at a target location. In many cases, the data that is extracted and accumulated is in a different format than what is ultimately needed in the target data repository. The process of acquiring this data and converting it into useful, compatible, and accurate data is referred to as an ETL process, as in extract, transform, and load.
In an ETL process, the extract phase acquires data from the source system(s). Data extraction can be as simple as copying a flat file from a database or as sophisticated as setting up interdependencies with remote systems that then supervise the transportation of source data to the target system. The extracted source data is typically stored as one or more relational database tables. The transform phase in the ETL process is typically made up of several stages and includes parsing data, converting data formats, and merging extracted source data to create data in a format suitable for the data repository, or target database(s). The load phase of the ETL process includes depositing the transformed data into the new data store (e.g., the data repository, Warehouse, mart, etc.). When the data repository is a relational database, the load process is often accomplished with structured query language (SQL) commands (e.g., IMPORT), utilities (e.g., LOAD), or other SQL tools.
Relational database systems store data in tables that are organized into rows and columns. This allows large volumes of data to be accessed efficiently and conveniently in response to SQL (Structured Query Language) statements, such as SELECT, INSERT, DELETE, and UPDATE. Since SQL is a declarative language, i.e., it only specifies what data is to be accessed, not how that data is to be accessed, database systems include optimizers that formulate different plans for accessing data. Database administrators regularly perform updates to database statistics in order to facilitate query optimization. Database object statistics are used by the query engine to determine access paths. Current database statistics allow the query engine to create the optimal access pathway for retrieving specific data from relational databases through SQL commands. An optimal access pathway provides for improved query performance, and reduces the burden on system resources.