In the context of data storage and management, query optimization attempts to determine a most optimal or efficient way to execute a given query. Because structured query language (SQL) is declarative, there are typically multiple different ways to execute a given query (query plans) and each of those ways yields a different performance while arriving at the same results. A query plan is an ordered sequence of steps used to access or modify information in SQL based data systems. A typical query optimizer considers all the possible query plans for a given query and determines which one of those query plans will be most efficient (produce the least amount of latency).
In distributed data systems where data is stored across many nodes of a cluster, it can be difficult to determine a single optimal query plan for the entire distributed system. For example, depending on the data distribution across the cluster, one query plan may be optimal for some of the nodes in the cluster, while a different query plan will be optimal for other nodes.