Enterprises now track all aspects of their business electronically. Every transaction with a customer, information about the customer, inventory, capital, expenses, etc. are captured, indexed, and stored in an enterprise's data warehouse. Very quickly the enterprise warehouse becomes enormous in size and operations against the warehouse can be time consuming even with the most robust architectures and database techniques.
Often an enterprise wants to gather statistics about its data warehouse to help the enterprise better manage and organize database tables within the warehouse.
One particular technique is called selectivity estimation. With selectivity estimation a probability is computed to determine the probability with which any particular row in a database table will be returned with a given predicate (search condition with a comparison operation). Any particular predicate is said to be highly selective if that predicate is less than 10% (0.10) indicating that just 10% or less of the rows of a database table are likely to be returned when that predicate is used in a search query.
For non-equality predicates, traditional multi-column statistics cannot be used to estimate the combined selectivity of all predicates. For example, selectivity of the predicate “market-segment IN (‘FURNITURE’, ‘MACHINERY’) AND account-balance BETWEEN 100.23 and 200.0” cannot be accurately estimated using the current multi-column statistics on (account-balance, market-segment). Therefore, combined selectivity is computed by either using a fudge factor or simply multiplying the individual selectivities. The former is just a pure guess and the latter serves well only when the two columns are truly independent. For other cases, it can cause severe underestimation. So, the problem here is to identify the cases where two columns are independent so that the combined selectivity can be computed using multiplication of two individual selectivities.
Thus, improved mechanisms for independent column detection for use in selectivity estimation are needed.