1. Field of the Invention
The present invention generally relates to data processing and, more particularly, to managing operation of user-defined functions on a partitioned database.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system (RDBMS) is a database management system (DBMS) that uses techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways.
Databases are typically partitioned to improve availability, performance, and scalability. Partitioning a database involves dividing the database or its constituent elements into distinct individual parts. For example, a database may be partitioned by building smaller separate databases, each with its own tables, indexes, transaction logs, etc. or by splitting a selected element, for example a field of a table. The database may be partitioned within a single server, or distributed or replicated across multiple servers. Therefore, database partitioning provides multiple benefits including scalability to support large databases, the ability to handle complex workloads, and increased parallelism.
When queries are run against a partitioned database, the query may be run against each partition. The results from each partition may then be integrated to provide a complete query result for the query. To further improve performance of querying a database, the query may not be run against one or more partitions which are known to not contain results for the query. For example, an underlying database may be partitioned based on location. The locations, for example, may be divided into 4 partitions, each partition being associated with data from one of the eastern states, western states, northern states, and southern states.
If a query containing a condition STATE=‘MAINE’ is run against the underlying database, the query need not be run against partitions containing data for southern and western states. Therefore, by eliminating the number of partitions against which a query is executed, the performance may be improved. However, even with elimination of partitions, the query may still be run against multiple partitions. For example, the above query may be executed against the northern states partition and the eastern states partition.
Furthermore, queries that are executed against a partitioned database can be associated with user-defined functions that are configured to perform predefined operations on data returned from the partitioned database in response to execution of the queries. For instance, if a given query requests data related to prices of selected products, an associated user-defined function (UDF) can be configured to change retrieved prices if a predefined condition is satisfied. By way of example, the UDF could be configured for computing discounted prices for senior citizens living in the state ‘MAINE’.
One problem with running a query having an associated UDF against multiple partitions is that a given UDF may not be necessary suitable for execution on each data partition. For example, assume that in the given example the above query is associated with the above UDF and run against the partitions containing data for the northern and eastern states. Assume further that the eastern states partition does not include data records related to senior citizens and that determination of a result set from the eastern states partition takes significantly longer than determination of a result set from the northern states partition. Accordingly, execution of the query against the eastern partition may be unnecessarily slowed down lead to a meaningless result set as the UDF is executed against data that does not need to be modified not executed on a corresponding result set returned from the eastern partition. Thus, execution of the query against the underlying database is inefficient because the complete result set for the above query is not returned to the user until the results from the slower partition, i.e., the eastern states partition, are available. Furthermore, a significant amount of time may be wasted while waiting for the slower partition to complete. Therefore overall query throughput may be adversely affected.
Therefore, there is a need for an efficient technique for managing operation of user-defined functions on partitioned databases.