1. Field of the Invention
The present invention relates to systems and methods for performing queries on data stored in a database, and in particular to a method and system for executing SQL trigger actions in parallel.
2. Description of the Related Art
A trigger in a database provides that upon the occurrence of an event (such as an update of a record or tuple in a database), various actions are executed (referred to as trigger actions). Trigger actions can contain further modifications to a database. However, in a distributed and/or parallel database system, records and tuples may be spread out across various data servers and data storage units and the event or trigger may be fired from anywhere in the system. In such a situation, the prior art does not provide the ability to execute the trigger actions across the necessary data servers from anywhere in the system (e.g., where the triggering event (or tuple modification that caused the trigger to fire) is located). These problems may be better understood by describing databases and triggers.
The ability to manage massive amounts of information has become a virtual necessity in business today. The information and data are often stored in related files. A set of related files is referred to as a database. A database management system (DBMS) creates and manages one or more databases. Today, DBMSs can manage any form of data including text, images, sound and video. Further, large-scale integrated DBMSs provide an efficient, consistent, and secure means for storing and retrieving the vast amounts of data.
Certain computer languages have been developed and utilized to interact with and manipulate the data. For example, SQL (Structured Query Language) is a language used to interrogate and process data in a relational database (a database in which relationships are established between files and information stored in the database). Originally developed for mainframes, most database systems designed for client/sever environments support SQL. SQL commands can be used to interactively work with a database or can be embedded within a programming language to interface to a database. Thus, methods and functions may embed and utilize SQL commands.
To expedite the processing of information, databases may be parallelized such that data is distributed across multiple locations or data servers. When a query is executed, the query may be parallelized and run on each of the data servers. Such query parallelization provides for running the same query on multiple small portions of data in parallel rather than running the one query over one large portion of data sequentially.
A trigger provides that whenever any tuple or record in a DBMS is updated, some condition is checked. If the condition is true, then a xe2x80x9ctrigger actionxe2x80x9d is executed. A trigger action can be any arbitrary SQL operation, stored procedure (see description below), user defined function (see description below), or any action desired. For example, in a Teradata(copyright) Object Relational (TOR) database (discussed in detail below), the trigger may use any of TOR""s object relational features. Further, each trigger action in turn can cause its own triggers to fire. On a parallel or distributed DBMS the event (e.g., the update of a tuple) occurs on a particular data server (i.e., the data server where the tuple is located). Where the conditions are checked and where the trigger action is fired or invoked from may be anywhere in the system. However, the trigger action may need to execute a query on the entire database that is distributed across other data servers. The prior art does not provide a method for executing trigger actions across multiple data servers when the trigger is fired anywhere in a system. For example, when a trigger is fired from a particular data server, the prior art does not provide any method for executing the trigger action across all of the data servers.
Triggers may invoke the same set of commands/actions (or the same set of commands/actions with different parameters) at varying times and locations of a program. In such a situation, the set of commands/trigger actions may be placed into a stored procedure. A stored procedure is a batch of SQL statements stored in a database/on a server, that may be partially or fully processed/compiled before it is stored (or upon its first invocation). Additionally, a stored procedure is a method or procedure written in any programming language that is partially or fully processed/compiled before it is stored (or upon its first invocation).
Stored procedures may be called directly from a client or from a database trigger and are often stored on the server. A database trigger, as described above, is a user defined mechanism that causes a stored procedure to automatically initiate and execute upon the occurrence of the user specified events in the database (i.e., when the trigger xe2x80x9cfiresxe2x80x9d). Thus, the trigger may not fire unless the event(s) specified by the user occurs. For example, a user may define a trigger to automatically fire whenever a user updates, deletes, or inserts data of a specific tuple.
Since stored procedures are typically stored on a data server, they are available to all clients and do not need to be replicated in each client. Further, by storing the stored procedure on the server, when the stored procedure is modified, all clients automatically get/have access to the new version. This saves programming effort especially when different client user interfaces and development systems are used. Further, this allows stored procedures to be an easy mechanism for sharing complex queries and functions between multiple applications. Additionally, SQL and stored procedures may call other stored procedures or cause other triggers to fire and may be written independently from (and without knowledge of) the underlying DBMS.
A stored procedure may be partially or completely processed/compiled before it is stored on the database. Consequently, the stored procedure does not have to be parsed and compiled each time it is invoked. Further, because a stored procedure is stored in a compiled format, it executes faster than if its constituent commands were executed individually.
Alternatively, a stored procedure may not be compiled prior to storage but may be automatically compiled the first time the procedure is invoked. As part of such a compilation, a query execution plan may be generated. The query execution plan describes the order in which tables are to be accessed and the indexes to be used. Further, the query execution plan is optimized for the stored procedure parameters and data in the database tables at the time the stored procedure is first executed.
A stored procedure may be invoked by its name. The caller can pass parameters to and receive results from the stored procedure. A user can create and name a stored procedure to execute specific database queries and perform other database tasks. For example, a user may create a stored procedure that returns the number of videos of a particular movie remaining in a video store for the video title that is specified at the time the stored procedure is called.
Stored procedures may also maintain the integrity of the database and prevent unauthorized users from modifying certain entries. For example, a user may be given the right to call a stored procedure that updates a table or set of tables but denied the right to update the tables directly.
Stored procedures may be created using a variety of mechanisms. The following format may be utilized to declare a stored procedure:
create proc {procedure name}
as
{statement of block of statements}
For example the following stored procedure called myproc will return the number of Casablanca videos left in a video store as well as other movie titles, the rental price of those movie titles, and the location of those videos when Humphrey Bogart is an actor in the movie:
create proc myproc
as
begin
select inv_videos
from video
where title_id=xe2x80x9cCasablancaxe2x80x9d
select title_id, rental_price, location
from video
where actor=xe2x80x9cHumphrey Bogartxe2x80x9d
end
As described above, stored procedures can also be passed parameters. Parameters may be defined as part of the stored procedure creation statement. The syntax of a xe2x80x9ccreate procxe2x80x9d command with parameters is:
create proc {proc name}
(@{param_name} {param_type},
@{param_name} {param_type},
)
as
{statement}
For example, the following stored procedure may be passed the @mytitle and @myactor parameters for use in the select query:
create proc myproc
(@mytitle char,
@myactor char
)
as
begin
select inv_videos, title_id, rental_price, location
from video
where title_id=@mytitle
or actor=@myactor
end
Once a stored procedure has been created, a user can invoke the stored procedure using the following syntax:
exec [database.owner.]{procname} {opt params}
In the above syntax, xe2x80x9cdatabasexe2x80x9d and xe2x80x9cownerxe2x80x9d will default to the current database and the current dbo (database owner). For example, the following command may invoke the stored procedure xe2x80x9cmyprocxe2x80x9d defined above:
exec myproc xe2x80x9cCasablancaxe2x80x9d, xe2x80x9cHumphrey Bogartxe2x80x9d
In this example, the user would see the same results as if the following command were utilized:
select inv_videos, title_id, rental_price, location
from video
where title_id=xe2x80x9cCasablancaxe2x80x9d
or actor=xe2x80x9cHumphrey Bogartxe2x80x9d
Additionally, the xe2x80x9cexecxe2x80x9d portion of an invocation of a stored procedure is not necessary if the stored procedure call is the first line in a batch.
Stored procedures can also have a xe2x80x9creturnxe2x80x9d status. A xe2x80x9creturnxe2x80x9d statement returns from the stored procedure with an optional status parameter. The return status is zero for success, or negative otherwise. Negative values between xe2x88x921 and xe2x88x9299 are reserved. For example, the following stored procedure returns a negative value (xe2x88x92999) if no rows/records are in the result set and a 0 if rows/records are in the result set:
create myproc (@mytitle char, @myactor char)
as
begin
select title_id, rental_price, location
from video
where title_id=@mytitle
or actor=@myactor
if @@rowcount=0
return xe2x88x92999
else
return 0
end
The following commands illustrate the invocation of the above stored procedure with a return status:
declare @mystatus int
exec @mystatus=myproc xe2x80x9cCasablancaxe2x80x9d, xe2x80x9cHumphrey Bogartxe2x80x9d
if @mystatus !=0
begin
{do error processing}
end
else
begin
{do normal processing}
end
In addition to providing parameters when invoking a stored procedure, default values for the parameters may be specified:
create proc myproc
(@myparam1 int=0,
@myparam2 int=null,
@myparam3 char(20)=xe2x80x9cmydefaultxe2x80x9d
)
as . . .
The above stored procedure provides a default value of 0 for @myparam1, a default value of null for @myparam2, and a default value of xe2x80x9cmydefaultxe2x80x9d for @myparam3. To invoke the stored procedure, the user may specify the values for one or more parameters, if desired, or may allow one or more of the default values to be utilized:
exec myproc 1/* params 2 and 3 default */
In the above invocation, only @myparam1 is specified in the execution line. Consequently, the specified default values for @myparam2 and @myparam3 are utilized (i.e., null and xe2x80x9cmydefaultxe2x80x9d).
Additionally, stored procedures can have output parameters as demonstrated below:
create proc myproc
(@myparam1 int
@myoutparam int output
)
as
begin
select @myoutparam=count(*)
from titles
where x greater than @myparam1
end
The output for the above stored procedure is the number of rows (i.e., the count) in the titles table where the value of x is greater than @myparam1.
Thus, since stored procedures may modify or access tuples in a database, a stored procedure may cause a trigger to fire and result in the execution of additional trigger actions.
In view of the vast amounts of data and types of data that have become popular, wider varieties of methods and functions for manipulating and working with the data have become a necessity. Such functions and methods are often written independently from (and without knowledge of) the underlying DBMS. Further, users often write such functions and methods (referred to as user defined functions (UDFs)). Such functions and methods often contain embedded SQL commands. Further, a UDF may be invoked as part of a trigger action.
Users can specify and attach arbitrary UDFs to existing datatypes in data servers. Consequently, when the datatype or data changes, a trigger may invoke the UDF. Typically, UDFs are utilized or invoked within a RDBMS (Relational Database Management System). Further, UDFs may operate on ADTs and can be written in any programming language including C ++. For example, a user can define an ADT for fingerprint images, and a UDF that analyzes fingerprint images. As part of an UPDATE query (an SQL command that provides the ability to modify data stored in a database), a UDF can extract and store in the database an alphanumeric feature vector representing each fingerprint in the database. Subsequent SELECT queries (queries to select certain records that meet certain specified conditions) can use these feature vectors to qualify rows for retrieval. Such UDFs are called xe2x80x9cUpdatexe2x80x9d UDFs. For example, the following UPDATE query illustrates the use of an Update UDF:
UPDATE FINGERPRINT_TBL
SET ADT_FNGR_PRNT_IMG=UDF_EXTRCT_VCTR
The above command updates the fingerprint table (FNGERPRINT_TBL) by setting (SET) each entry in the column containing abstract data types for fingerprint images (ADT_FNGR_PRNT_IMG) equal to the value provided in a UDF (UDF_EXTRCT_VCTR). In addition, a UDF can be passed parameters (ADT or standard data types) to utilize in its operations.
Alternatively, a UDF can be used directly in a SELECT query predicate to extract features from images and qualify rows for retrieval. The predicate of a SELECT query specifies the conditions or criteria for the records or information to be retrieved. Thus, the UDF is utilized as an operator to determine the information to be retrieved. For example, suppose the following SELECT query is utilized:
SELECT NAME, CITY, STATE, ZIPCODE
FROM CUSTOMER
WHERE UDF_BUYINGPOWER (NAME)=EXCELLENT
The type of SQL commandxe2x80x94SELECTxe2x80x94is specified followed by the information to be retrieved (e.g., NAME, CITY, STATE, and ZIPCODE) from the CUSTOMER table. The predicate is specified by the WHERE statement. Thus, in the above SQL query, the SELECT query predicate provides that only records wherein a buying power of a particular person is excellent will be retrieved. A UDF, UDF_BUYINGPOWER is utilized in the predicate as an operator to filter records or determine which information is to be retrieved. UDFs can also utilize ADTs. For example, instead of using the NAME parameter, an ADT may be utilized. When a UDF is utilized in the predicate, the UDF is called a xe2x80x9cPredicatexe2x80x9d UDF. In either case (i.e., Update or Predicate UDFs), the user is retrieving rows based on an object""s semantic content called xe2x80x9cquery-by-contentxe2x80x9d.
There is a third type of UDF called a xe2x80x9cProjectionxe2x80x9d UDF. A projection is a basic query operation that provides a subset (in a list) of the columns in a table. A projection UDF typically operates on objects in the projection list of a query. For example, if an image was stored in data storage device 132 in Tagged Image File Format (TIFF) but had to be delivered to the client in Joint Photographics Experts Group OPEG) format, this could be accomplished by executing a UDF that performed the TIFF to JPEG conversion on the images obtained in a query projection list before transferring it back to the client. The following SQL illustrates a Projection UDF:
SELECT UDF_TIFF_TO_JPEG(ADT_IMGS_TIFF)
FROM IMG_TBL
The above SELECT command selects the ADT_IMGS_TIFF column from the table (IMG_TBL). However, the SELECT command utilizes the UDF called UDF_TIFF_TO_JPEG to transform the ADT_IMGS_TIFF column into JPEG format. In the above example, the UDF is passed a parameter consisting of a column (e.g., the ADT_IMGS_TIFF column).
UDFs can be supplied from many sources. As the name infers, the database users can write UDFs. This is important because they are often quite application specific. UDFs can be supplied by the DBMS owner such as NCR Corporation, the assignee of the present invention. For example, UDFs can be utilized for text search and retrieval and for image manipulation. Alternatively, UDFs can also be supplied by third-party vendors who would like their object data types and processing algorithms to be incorporated into a DBMS.
To address the requirements described above, the present invention discloses a method, apparatus, and an article of manufacture for parallel execution of trigger actions.
One or more embodiments of the invention comprise providing a C++ class (hereinafter referred to as xe2x80x9cdispatcherxe2x80x9d) that can take an SQL query or trigger action and start parallel execution of the trigger action. The trigger action is optimized and parallelized. The dispatcher executes the trigger action (or query of a trigger action), sets up the communication links between the various operators in the trigger action, and ensures that all the results are sent back to the trigger.