Computing systems and associated networks have greatly revolutionized our world. The interconnection of such computing systems into networks has resulted in explosive growth in the ability to communicate data ushering in what is now called the “information age”. Information is often stored, managed and analyzed in datasets. In recent years the volume of data stored in datasets has grown rapidly, ushering in technology often referred to as “big data”. Such data growth is multi-dimensional, including increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources).
Datasets may exist in many forms. Commercial datasets often use parallel database management systems for large quantities of data, such that the data may be stored and distributed across multiple servers, multiple storage devices, and/or multiple partitions of a storage device. To reveal meaningful information from such a large quantity of data, data management systems often provide query interfaces that can receive and interpret queries issued by users against the system's data. Query results are generated by accessing a relevant dataset and manipulating it in a way that yields the requested data.
Since dataset structures are complex, the response data for any given query can often be collected from a dataset using any of a variety of different ways of executing the query. Each possible query execution typically requires different computing resources, such as processing time, memory, network bandwidth, storage channel bandwidth, and so forth. For instance, processing times of the same query may have a large variance, from a fraction of a second to hours, depending on the selected execution of that query. This is especially true as datasets grows larger.
Thus, query optimization is used to find a way to process a given query in less time. For instance, a query is typically compiled into a query tree of operators. The query tree is then improved via a query optimizer. The optimized query tree is then executed to yield the requested data. By optimizing the query tree, query performance can be improved.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.