Some embodiments of the present disclosure are directed to an improved approach for implementing validating database table partitioning schemes using stratified random sampling. For example, some commercial deployments have approached the task of validating database table partitioning schemes using specialized tools or modules, sometime referred to as “partition advisers”.
Earlier attempts at database table partitioning relied on exhaustively enumerating candidate partitioning schemes, and then evaluating possible candidate partition schemes using a query workload. A candidate partition was evaluated with respect to a different partition scheme based on the cost (e.g., empirical runtime cost or estimated runtime cost) of running a complete workload on the candidate partitioning schemes, and then comparing the costs to find the lowest cost partition for the given workload. Thus, identifying an optimal partitioning scheme can often become very time- and resource-consuming as the number of candidate partition schemes grows, and as the number of queries in the workload grows. In modern practice, the overall resource costs to find an optimal solution has become prohibitively high.
What is needed is a way for evaluating partitioning schemes to dramatically improve performance while concurrently:
Improving manageability.
Improving availability.
Performing partitioning in a manner that is transparent to the applications.
By validating candidate partition schemes against much smaller—and yet statistically representative—set of samples from the workload (e.g., using stratified random sampling), it is possible to significantly reduce the resource-intensity of evaluating partitioning schemes in order to make a partitioning scheme recommendation—and yet without compromising the quality of the recommendation. As aforementioned, legacy technologies, especially those technologies involving exhaustive enumeration for evaluation under large workloads, becomes impractical as the enumeration grows. Reliance on such technologies involving exhaustive enumeration can lead to incorrect partitioning recommendations. An improved approach is needed.