In large commercial database systems statistics are often gathered for the tables and other constructs of the database. These statistics are useful in identifying costs, which are used to generate an optimal plan for a given query. That is, large scale databases include query optimizers that determine a most efficient way to execute a given query by considering multiple different query plans and the cost of each individual query plan. The statistics are then vital to resolving costs for a given query, and it is the costs that the optimizers use to determine the optimal plan for a given query. Thus, inefficient or inaccurate statistics can adversely impact the performance of the database.
In many cases, the approach used by most database systems is to include statistics for a base table when evaluating a query. However, this is not always an optimal practice, especially when the base table is extremely large (which is common in the industry), which makes collecting statistics prohibitively expensive and when there are other conditions that are crucial to generate an optimal plan for a given query.
Overgeneralizing the statistics for a given query causes the costs of the query to be too high and does not adequately provide the database optimizer with needed information to optimally determine an appropriate query plan.
Another approach has been to use the statistics available in a given query for a join index or use no statistics at all. Such a technique requires converting selectivity estimation on the base table to selectivity estimation on the join index, which requires a lot of computations, including predicate mapping; this approach also lacks flexibility.
However, when the rows are from a same set, the histograms on the same fields are exchangeable; so, statistics can be inherited from one database object (construct) to another. Thus, when statistics on one object are missing, the statistics inherited from the other object can be transparently used, such as statistics on non-sparse join index (a materialized view with no predicates) and its base table.
When a query condition relationship is a superset or a subset, statistics inheritance encounters challenges, since the row filtering condition on the subset side needs to be considered. Furthermore, a solution is needed for combining the selectivity based on statistics inherited from object containing subset rows. From the superset object's view, the row filtering condition on subset object is applied to all of the selectivity estimated using statistics inherited from subset object. Also, when there are multiple objects with subset rows, there is a need to choose the best one or the ones that can coordinate with each other.