Large-scale Database Management Systems (DBMSs) often include voluminous tables with numerous columns and rows that span those columns. The DBMSs can also be distributed across multiple servers. As a result, the ability to efficiently process a query against any of the DBMS tables involves a lot more than simply processing the query conditions defined by a user.
A query optimizer provides a mechanism for query efficiency and exists with most DBMSs. The query optimizer generates a query plan based on statistics generated for the DBMS, the tables, the rows, the columns, the partitions, the data that populated the DBMS, histories for processing queries and accessing tables, and for a variety of other aspects associated with the DBMS.
However, for large DBMS tables, it becomes very difficult and expensive to collect statistical information. Moreover, maintaining and analyzing this statistical information can be a resource nightmare for any enterprise having a DBMS.
Further, when a DBMS supports queries that include or integrate foreign objects or data external to the DBMS environment, such as Hadoop® tables, any statistics associated with those foreign objects or data is not available to the DBMS and correspondingly the query optimizer.