1. Field of the Invention
The present invention generally relates to data processing and, more particularly, to executing queries against a partitioned database.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system (RDBMS) is a database management system (DBMS) that uses techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways.
Databases are typically partitioned to improve availability, performance, and scalability. Partitioning a database involves dividing the database or its constituent elements into distinct individual parts. For example, a database may be partitioned by building smaller separate databases, each with its own tables, indexes, transaction logs, etc. or by splitting a selected element, for example a field of a table. The database may be partitioned within a single server, or distributed or replicated across multiple servers. Therefore, database partitioning provides multiple benefits including scalability to support large databases, the ability to handle complex workloads, and increased parallelism.
When queries are run against a partitioned database, the query may be run against each partition. The results from each partition may then be integrated to provide a complete query result for the query. To further improve performance of querying a database, the query may not be run against one or more partitions which are known to not contain results for the query. For example, an underlying database may be partitioned based on location. The locations, for example, may be divided into four partitions, each partition being associated with data from one of the eastern states, western states, northern states, and southern states.
If a query containing a condition STATE=‘ALABAMA’ is run against the underlying database, the query need not be run against partitions containing data for northern and western states. Therefore, by eliminating the number of partitions against which a query is executed, the performance may be improved. However, even with elimination of partitions, the query may still be run against multiple partitions. For example, the above query may be executed against the southern states partition and the eastern states partition.
Furthermore, in the given example a complete query result may be obtained for particular queries from a single partition. For example, assume that a given query is run against the partitions containing data for the southern and eastern states in order to identify different age groups of individuals driving a particular automobile. Assume further that only five data records that are ordered in ascending order by location should be returned in a corresponding query result. Assume now that data related to ten distinct age groups is included with the underlying database and that the southern states partition is slower than the eastern states partition. In other words, determination of a result set from the southern states partition takes significantly longer than determination of a result set from the eastern states partition.
By way of example, assume now that a result set can quickly be returned from the eastern states partition having five data records related to distinct age groups in the state Alabama. Accordingly, the result set represents a complete query result so that data returned from the southern states partition would not be required to complete this query result. Thus, execution of the query against the underlying database is inefficient because the complete result set for the above query is not returned to the user until the results from the slower partition, i.e., the southern states partition, are available. Furthermore, a significant amount of time may be wasted while waiting for the slower partition to retrieve results. Therefore overall query throughput may be adversely affected.
Therefore, there is a need for an efficient technique for managing execution of queries against partitioned databases.