Structure Query Language (SQL) is a computer language for retrieving and managing data in a database system. SQL statements are used to query a database system for data. When compiling SQL statements, a query compiler determines estimated costs for different possible execution plans and compares these plans based on their estimated costs so that the lowest costing execution plan is chosen. These estimated costs are derived from operator-cost-estimation formulas and are accumulated over all the operations in a possible execution plan. These formulas use table, view, and nickname (remote table in a federated database system which is also called a heterogeneous distributed database system) statistics along with the database and system configuration information to calculate the operation costs including input/output (I/O), central processing unit (CPU), communication, buffer usage, and total time of execution costs.
In a federated system, remote tables are mapped on objects called nicknames. Users' SQL statements are received at a local system to be compiled and executed using local and remote system resources. The SQL statements refer to the remote tables using the nicknames explicitly in the statement, and the query compiler uses the nickname-to-remote table mapping information to generate a good query execution plan using the local and remote resources efficiently. Since the query compiler executes locally, it thus would require statistics on remote tables stored locally in these nicknames to make proper execution plans and estimates. For example, if a table is recorded as having a statistic of 1,000 pages and the operation is to perform a table scan on the table, the estimated I/O costs will require 1,000 pages being transferred from disk. However, if the table is actually stored on 1,000,000 pages on a disk, the table scan's actual transfer cost could be as much as 1,000 times more than the estimated cost. Assuming a 1,000 page statistic, the table scan is estimated to be lower than an index scan, and the operations contain an inequality predicate on the index key. It is possible that using the real number of pages of 1,000,000 would have made the table scan cost much higher than the index scan. Many examples exist where inaccurate statistics lead to inaccurate estimates and consequently where the query compiler selects suboptimal execution plans.
Because the query compiler's plan selection depends on having good database statistics, having accurate statistics for the database at the local server is imperative. Over time, the database could be modified such that its real estate deviates from previously collected statistics. Ensuring statistics are current for recurring SQL statement compilations is crucial.
Accordingly, what is needed is an improved method and system for collecting statistics on database tables. The present invention addresses such a need.