1. Field of the Invention
The present invention relates to systems and methods for performing queries on data stored in a database, and in particular to a method and system for executing SQL from within user defined functions.
2. Description of the Related Art
The ability to manage massive amounts of information has become a virtual necessity in business today. The information and data are often stored in related files. A set of related files is referred to as a database. A database management system (DBMS) creates and manages one or more databases. Today, DBMSs can manage any form of data including text, images, sound and video. Further, large-scale integrated DBMSs provide an efficient, consistent, and secure means for storing and retrieving the vast amounts of data.
Certain computer languages have been developed and utilized to interact with and manipulate the data. For example, SQL (Structured Query Language) is a language used to interrogate and process data in a relational database (a database in which relationships are established between files and information stored in the database). Originally developed for mainframes, most database systems designed for client/sever environments support SQL. SQL commands can be used to interactively work with a database or can be embedded within a programming language to interface to a database. Thus, methods and functions may embed and utilize SQL commands.
In view of the vast amounts of data and types of data that have become popular, wider varieties of methods and functions for manipulating and working with the data have become a necessity. Such functions and methods are often written independently from (and without knowledge of) the underlying DBMS. Further, users often write such functions and methods (referred to as user defined functions (UDFs)). Further, such functions and methods often contain embedded SQL commands.
To optimize the processing time for working with and manipulating the data, some DBMS have distributed the data and provided for parallel processing of and to the data. Thus, the UDFs utilized to manipulate and work with the data are executed in parallel on the parallelized/distributed data. Some UDFs are associated directly with certain types of data on a particular data server (storage location for the data). However, these UDFs may attempt to manipulate and retrieve information from data not located on the data server where the UDF is located. Accordingly, it is difficult to start up parallel execution of an UDF that resides on any one data server.
Further, since the UDFs may be written independently from (and without knowledge of) the parallelized data system, it is difficult to provide results to the UDF in a clean manner. In other words, when a UDF operates or requests data (i.e., using SQL commands), the interface within which the results are returned is difficult to establish and maintain without exposing the parallelism to the UDF. What is needed is a system and method for efficiently and cleanly executing SQL statements from within UDFs on a parallelized DBMS.
To address the requirements described above, the present invention discloses a method, apparatus, and an article of manufacture for parallel execution of SQL operations from within user defined functions.
The method comprises providing the user defined function (UDF) with a C++ class (hereinafter referred to as xe2x80x9cdispatcherxe2x80x9d) that can take an SQL query and start parallel execution of the query. The query is optimized and parallelized. The dispatcher executes the query, sets up the communication links between the various operators in the query, and ensures that all the results are sent back to the data-server that originated the query request. Further, the dispatcher merges the results of the parallel execution and produces a single stream of tuples that is fed to the calling UDF. To provide the single stream to the calling UDF, one or more embodiments of the invention utilize a class that provides the UDF with a simple and easy-to-use interface to access the results of the nested SQL execution. In one or more embodiments of the invention, a C++ class such as the TOR InputStream class available from NCR Corporation, the assignee of the present invention is utilized.