When a query statement such as an SQL (structured query language) query statement is received from a client by a database, steps such as syntax analysis, semantic analysis, and optimization need to be performed on the query statement, so as to generate an execution structure. An important component affecting execution efficiency of an SQL statement in a database management system (DBMS) is the optimizer which is configured to output a query plan during compilation. An executor performs a data operation according to the generated query plan.
Most optimizers nowadays are cost based. The query plan includes multiple steps where certain database operators are applied such as scanning a table (possibly filtering rows based on certain conditions at the same time), doing aggregation, and joining two tables, etc. The optimizer estimates a cost for each step based on an input (such as, estimated cardinality) to an operator and a cost model of the operator. The cost of the query plan is simply the sum of all the step costs. The optimizer searches the plan space, computes the cost for candidate plans, and picks the one with the lowest cost.
In a cost based optimizer, there are three major components: cardinality estimation (CE), cost model, and query plan search. And among them, the cardinality estimation is the most critical. It is also the hardest one to get right and the estimates can be off by orders of magnitude, resulting in sub-optimal plan.