1. Field of the Invention
The present invention generally relates to data processing, and more specifically to executing queries against a partitioned database.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system (RDBMS) is a database management system (DBMS) that uses techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways.
Databases are typically partitioned to improve availability, performance, and scalability. Partitioning a database involves dividing the database or its constituent elements into distinct individual parts. For example, a database may be partitioned by building smaller separate databases, each with its own tables, indexes, transaction logs, etc. or by splitting a selected element, for example a field of a table. The database may be partitioned within a single server, or distributed or replicated across multiple servers. Therefore, database partitioning provides multiple benefits including scalability to support large databases, the ability to handle complex workloads, and increased parallelism.
When queries are run against a partitioned database, the query may be run against each partition. The results from each partition may then be integrated to provide the result for the query. To further improve performance of querying a database, the query may not be run against one or more partitions which are known to not contain results for the query. For example, a database may be partitioned based on location. The locations, for example, may be divided into 4 partitions, each partition being associated with data from one of the eastern states, western states, northern states, and southern states.
If a query containing a condition STATE=‘MAINE’ is run against the database, the query need not be run against partitions containing data for southern and western states. Therefore, by eliminating the number of partitions against which a query is executed, the performance may be improved. However, even with elimination of partitions, the query may still be run against multiple partitions. For example, the above query may be executed against the northern states partition and the eastern states partition.
One problem with running a query against multiple partitions is that the response time for each partition may be different. For example, if the above query is run against the partitions containing data for the northern and eastern states, the northern states partition may take longer time to retrieve results than the southern states partition. Therefore, the response time of the query is governed by the slowest partition returning results to satisfy the query. Because some partitions may execute queries faster than others, critical resources may be unnecessarily used at the faster partitions. In other words, dedicating the critical resources to the above query, which is done at the expense of executing other queries, is inefficient because the complete result set for the above query is not returned to the user until the results from the slower partition are available. Furthermore, a significant amount of time may be wasted while waiting for the slower partition to retrieve results. Therefore overall query throughput may be adversely affected.
Accordingly, what is needed are improved methods, systems, and articles of manufacture for improving query throughput in a partitioned database environment.