A database is a collection of stored data that is logically related and that is accessible by one or more users or applications. A popular type of database is the relational database management system (RDBMS), which includes relational tables made up of rows and columns (also referred to as tuples and attributes). Each row represents an occurrence of an entity defined by a table, with an entity being a person, place, thing, or other object about which the table contains information.
To extract data from, or to update, a relational table in an RDBMS, queries according to a standard database query language (e.g., Structured Query Language or SQL) are used. Examples of SQL query statements include INSERT, SELECT, UPDATE, and DELETE.
As applications become increasingly sophisticated, and data storage needs become greater, higher performance database systems are used. One such database system is the TERADATA® database management system from NCR Corporation. The TERADATA® database systems are parallel processing systems capable of handling relatively large amounts of data. In some arrangements, a database system includes multiple nodes that manage access to multiple portions of data to enhance concurrent processing of data access and updates. In TERADATA® database management systems, concurrent data processing is enhanced by the use of virtual processors, referred to as access module processors (AMPs), to further divide database tasks. Each AMP is responsible for a logical disk space. In response to a query, one or more of the AMPs are invoked to perform database accesses, updates, and other manipulations.
One of the goals of a database management system is to optimize the performance of queries for access and manipulation of data stored in the database. Given a target environment, an optimal query plan is selected, with the optimal query plan being the one with the lowest cost (e.g., response time) as determined by an optimizer. The response time is the amount of time it takes to complete the execution of a query on a given system.
Typically, an optimizer calculates cost and/or other useful metrics based on statistics of one or more columns (or attributes) of each table. In some cases, statistics are stored in the form of a histogram. In database systems that store large tables, the cost of collecting statistics for such large tables can be quite high, especially if all rows of a table need to be scanned to collect the statistics. As a result, some database users may choose not to collect statistics for columns of tables over a certain size. The lack of statistics for some tables may adversely affect operation of certain components in the database system, such as the optimizer and other tools.