A database is defined as a collection of data items, organized according to a data model, and accessed via queries. The present invention applies to any data model. The invention is illustrated using a relational database model.
A relational database may be organized into a two dimensional table containing rows and columns of information. In a relational database or relation, data values are organized into columns or fields. Each column of the relation corresponds to a particular attribute and has a domain which comprises the data values of that attribute. Each row of a relation, which includes one value from each attribute, is known as a record or tuple.
FIG. 1 shows an exemplary relational database (prior art) consisting of two relations A and B. The relation A of FIG. 1 contains data pertaining to a population group. The relation A has six attributes or columns 2-1, 2-2, 2-3, 2-4, 2-5 and 2-6, for storing, respectively, name, age, weight, height, social security number and telephone extension data values of the population. The database also has twelve records or tuples 3-1, 3-2, 3-3, . . . , 3-12. Each tuple 3-1, 3-2, 3-3, . . . , 3-12 has one data value from each attribute. For instance, the tuple 3-10 has the name attribute value "lee", the age attribute value 40, the weight attribute value 171, the height attribute value 180, the social security number attribute value 999-98-7654 and the telephone extension attribute value 0123. Furthermore, the relational database has a second relation B which contains three attributes 4-1, 4-2, an 4-3 for storing, respectively, social security number, course number and grades. The relation B contains seven tuples 5-1, 5-2, 5-3, . . . , 5-7.
Often, it is desirable to identify and/or retrieve tuples which meet criteria of interest. Queries are used to retrieve tuples of interest from a relation using selection operations. Selection operations incorporate predicates. Predicates are logical or mathematical expressions for specifying the criteria which must be satisfied by the tuples in order to be selected. For instance, it may be desired to select all tuples of a relation R having an attribute I value which is the same as some constant J. Such a select operation is denoted R.multidot.I=J or S.sub.R.I=J. The selection operation is specified by the predicate "I=J". The predicate, in turn, incorporates the selection operator "equals" for specifying the desired criteria that the selected tuples must satisfy. Other selection operators include "greater than," "less than," etc. Additionally, individual predicate terms may be combined with logical operators "AND," "OR" "NOT" etc. to form complex predicates.
A transaction is defined as a program that includes database operations. The collection of tuples obtained through select operations of a transaction against the database is known as a transactions read-set. The set of tuples either modified, added, or deleted through modification, addition or deletion operations of a transaction against the database is known as a transaction writeset.
A database system is an apparatus which maintains a database and executes queries against the database. Advantageously, the database system supports certain advanced functions and features. For instance, the database system preferably includes a device which performs a "database mirror" function or a device which performs a "database snapshot" function, or both. Furthermore, the database system also illustratively supports the "database trigger" and "relational join" features. Each of these functions and features is described below.
A database mirror is a device which dynamically maintains an up-to-date back-up copy of the database and optimally is located in a separate physical remote location. The database mirror is preferably a separate physical device from the primary store/source of the database. In addition to maintaining an up-to-date copy of the database, the database mirror assumes the function of the primary database store/source in the event the primary database store/source fails.
A data base snapshot is a copy of all or part of the database that is isolated from update activity. Database snapshots are often used when a series of queries are to be processed and where the database must not change during the processing of those queries. Database snapshots are also used to permit simultaneous query processing against a copy of a database in situations where the processing capacity of the primary copy is utilized entirely to support transaction processing. Database snapshots may contain the entire database, or may contain a subset of the database as defined by specific selection predicates.
Database trigger procedures are procedures that are periodically applied to a database and whose results may vary depending on changes to the database. For instance, the database may include continually updated stock price data. A trigger procedure may be provided which causes a buy or sell order to occur in the event specific stock prices rise above or fall below certain thresholds.
A relational join operation operates on two or more argument relations with a join predicate to form a third relation. A common use of the join operation is to form an "intersection relation." That is, if two relations have a common attribute, called the join attribute, a join operation can be used to construct a third relation containing tuples formed from tuples of each of the two argument relations that satisfy the join predicate. For example, consider the join operation:
as applied to the argument relations A and B as defined in FIG. 1. This operation joins attributes A.Name, A.Age, A.Soc.sub.-- Sec., and A.Tel from the tuples of Relation A with attributes B.Course and B.Grade of the relation B where the value of the A.Soc.sub.-- Sec join attribute in tuples of Relation A equals the value of the B.Soc.sub.-- Sec join attribute in tuples of Relation B. As a result, a relation C is constructed as shown in FIG. 2 (prior art). The relation C has six attributes: 6-1, 6-2, . . . 6-6 for C.Name, C.Age, C.Soc.sub.-- Sec, C.Tel, C.Course and C.Grade, respectively, and has seven tuples 7-1, 7-2, . . . , 7-7. Each tuple of Relation C is formed from attributes in Relation A tuples and attributes in Relation B tuples which have the same Soc.sub.-- Sec join attribute values.
A broadcast database system is a system for storing a database in one computing environment, for broadcasting all or parts of the database to other computing environments, and for performing database selection operations at those other computing environments in order to obtain the read-sets for queries. See T. BOWEN, G. GOPAL, G. HERMAN, T. HICKEY, K. C. LEE, W. MANSFIELD, JR., J. RAITZ & A. WEINRIB, "The Datacycle Architecture," COMMUNICATIONS OF THE ACM, SPECIAL ISSUE ON INFORMATION FILTERING (1992). FIG. 3 shows a broadcast database system called the "Datacycle" which supports database management, query processing and transaction processing functions. As shown, a general purpose computer 10 is connected to a broadcast database system 30 via a bus 18. By means of this interconnection, the general purpose computer 10 and the broadcast database system 30 can communicate with each other. For instance, the general purpose computer 10 may transfer queries inputted by a user at the general purpose computer 10 to the broadcast database system 30 for execution. The broadcast database system 30 can return the read-set of the query to the general purpose computer 10 for output to the user. In addition, the general purpose computer 10 can transfer database management instructions to the broadcast database system 30. For instance, the general purpose computer 10 can transfer a write-set of tuples and instructions for adding to, deleting from, or modifying, the database using the write-set of tuples.
As shown, the broadcast database system 30 includes a processor 34 connected to a database memory 32 by a connection 33. The database memory 32 stores the database. By means of the connection 33, the processor 34 can access the database memory 32 for purposes of adding tuples to, deleting tuples from, or modifying tuples in, the database.
The database memory 32 is connected via a connection 36 to a selection system 40 such as the selection system disclosed in U.S. Pat. No. 5,230,073. In the selection system 40, a transmitter 41 cyclically retrieves the contents of the memory 32. The transmitter 41 transmits over a downstream broadcast channel or shared distribution medium 42 a bitstream containing the tuples of the database.
A plurality of filter subsystems 43-1, 43-2, . . . , 43-n, such as those disclosed in U.S. Pat. No. 5,050,075, are connected to the shared distribution medium 42. As depicted, each filter subsystem 43-1, 43-2, . . . , 43-n is connected in parallel to the shared distribution medium 42. Each filter subsystem 43-1, 43-2, . . . , 43-n illustratively comprises a high speed comparator and sufficient arithmetic circuitry for performing on the fly, i.e., real time, arithmetic and logical computations on the tuples broadcasted on the shared distribution medium. Illustratively, the processor 34 is capable of programming the filter subsystems 43-1, 43-2, . . . , 43-n so that they select tuples in accordance with queries received from the general purpose processor 10. The filter subsystems 43-1, 43-2, . . . , 43-n may thus retrieve selected tuples or simply perform computations on tuples and output resulting values. Tuples or resulting values outputted by the filter subsystems 43-1, 43-2, . . . , 43-n are transferred to the processor 34 via a communications channel 44.
In operation, the general purpose computer 10 transfers a query for execution to the processor 34. The processor 34 programs the filter subsystems 43-1, 43-2, . . . , 43-n for performing computations on each tuple broadcasted on the shared distribution medium 42 in accordance with the query. The transmitter 41 then cyclically retrieves the tuples stored in the database memory 32 and broadcasts the retrieved tuples in the form of a bitstream on the shared distribution medium 42. The filters 43-1, 43-2, . . . , 43-n monitor the shared distribution medium 42 and retrieve each tuple as it is broadcasted. The filters 43-1, 43-2, . . . , 43-n apply predicates and arithmetic functions to the tuples. Thereafter, the filter subsystems 43-1, 43-2, . . . , 43-n transfer tuples which satisfy the query, identifiers to tuples which satisfy the query, or other values via the communication channel 44 to the processor 34. The processor 34 may further process the tuples or values and transmit further instructions for selecting tuples which satisfy the query. The selected tuples form the read-set of the query and are transferred from the processor 34 to the general purpose processor 10 via the bus 18.
According to the Datacycle architecture, the transmitter 41, database memory 32 and processor 34 form a "pump." The purpose of the pump is to cyclically broadcast the database and to maintain, i.e., add tuples to, delete tuples from, or modify tuples in, the database. Each filter subsystem 43-1, 43-2, . . . , 43-n is a type of passive "data access manager." A data access manager receives tuples of a relation and performs selection operations on the tuples to determine if the tuples satisfy received queries.
The broadcast database architecture of FIG. 3 has two significant design criteria. First, the elements which make up the broadcast database system 30 are each highly specialized devices. That is, the pump (i.e., processor 34, memory 32 and transmitter 41, collectively) is specially designed to broadcast relations but cannot execute queries. On the other hand, the data access managers (i.e., the filter subsystems 43-1, 43-2, . . . , 43-n) are highly specialized passive "data filters"--they can execute queries against relations but must rely on the pump to broadcast the relations. Such specialization allows each element to be constructed to perform its intended task efficiently. Second, the broadcast database architecture of FIG. 3 employs strict synchronization between query execution in the data access managers and the cyclical broadcast of the tuples by the pump. In other words, the data access managers must complete execution of queries against each received tuple before the next tuple is received from the pump. Stated another way, if the tuples are each sequentially broadcasted in successive fixed length time intervals T, the data access managers must complete execution of the query for each tuple within the time interval T.
However, these design criteria can also be disadvantageous. First, the strict synchronization criterion constrains the per tuple query execution time to be equal to (or less than) the tuple broadcast, time. Under strict synchronization, a data access manager can only execute queries which can be completed, for a worst case tuple, before the next tuple is broadcasted.
This "coupling" of query execution time to broadcast time prevents optimization of the query execution, for example, by taking advantage of spare processing time through short-circuit boolean evaluation, or by skipping execution of query instructions which follow a predicate that evaluates false, or both. Short circuit-boolean evaluation increases the speed of predicate evaluation for complex predicates that have multiple conjunctive terms (i.e., predicate terms connected by logical AND) or multiple disjunctive terms (i.e., predicate terms connected by logical OR). Using short-circuit boolean evaluation, it is advantageous to evaluate conjunctive terms in the order of highest selectivity to lowest selectivity. Here, "selectivity" is an inverse measure of the number of tuples, in the particular database against which a conjunctive term is applied, which satisfy that conjunctive term. That is, a first term has a higher selectivity in a particular database than a second term if fewer tuples in that particular database satisfy the first term than satisfy the second term.
For example, consider the following complex query predicate with conjunctive terms:
WHERE (age=15) AND (sex=male) AND (sport=basketball) wherein, against a database of interest, 10% of the tuples satisfy the first term of the predicate (age=15), 50% of the tuples satisfy the second term of the predicate (sex=male) and 0.1% of the tuples satisfy the third term in the predicate (sport=basketball). Advantageously, the third term in the predicate is first executed against each tuple. Then, the first term in the predicate is illustratively executed against only the 0.1% of the tuples that satisfy the third term in the predicate. Finally, the second term in the predicate is only executed against the tuples which satisfy both the first and third terms in the predicate.
The optimization technique, wherein instructions following a complex predicate that evaluates false are not executed, is useful where a query includes one or more predicates followed by a variety of instructions. Such instructions could be complex arithmetic instructions, instructions for selecting a record and transferring attributes thereof, or instructions involved in projection operations, modifications/transformations of the data or hash functions (described in detail below), etc. (Generally speaking, this technique is frequently implemented to save useful query execution time in an environment where such saved time can be used to execute queries against subsequent tuples. However, for reasons discussed below, this technique cannot save such useful query execution time in the prior art broadcast database system 30 of FIG. 3). Initially, only the predicate terms of the complex predicate are executed against attribute information of a tuple. If the tuple has attribute information which satisfies these predicate terms, the more complex and time consuming instructions are executed. On the other hand, if the tuple's attribute information fails to satisfy the initially executed predicate terms, the complex time consuming instructions are not executed. Rather, the query is executed against the attribute information of the next tuple.
For example, consider the query:
wherein the predicate portion (which follows "WHERE") illustratively satisfies 0.05% of the tuples and uses, at most, three broadcast cycles (of the broadcast database system 30 of FIG 3) to execute, and wherein the complex instruction PRODUCT (age, weight) illustratively uses sixteen broadcast cycles to execute. By executing the complex instruction PRODUCT (age, weight) for only those tuples which satisfy the predicate portion of the query, a substantial overall savings in useful query execution time is achieved.
In either optimization technique, the query execution time varies from tuple to tuple. Tuples having attribute information that satisfies the predicate of the query ("interesting tuples") utilize a larger execution time than tuples having attribute information that fails to satisfy the predicates of the query ("uninteresting tuples"). However, the data access managers 43-1 to 43-n of the broadcast database system 30 of FIG. 3 are not capable of executing only part of their instructions in the case attribute information of a particular tuple fails to satisfy a predicate thereof. Rather, due to design constraints in the prior art, the data access managers 43-1 to 43-n must execute all of their instructions for each tuple. Moreover, using an optimization technique such as described above, a complicated query may be devised which has a worst case per tuple execution time that is greater than the per tuple broadcast time but an average execution time which is less than or equal to the broadcast time. Nevertheless, the data access managers 43-1 to 43-n cannot execute such a query if the query cannot be executed against a worst case tuple in a time which is less than or equal to the broadcast time.
Second, the strict specialization of function of the pump 32,34,41 and data access managers 43-1 to 43-n prevents the execution of other search optimization strategies which require elements that can perform both functions. For instance, it is often desirable for a data access manager to download a snapshot of the database and to operate on this snapshot independently--in particular, asynchronously with--the broadcast of the relation by the pump.
Often, it is merely desired to create a snapshot which stores a temporary relation created from attribute values selected in executing a query. For example, it may be desirable to execute a subsequent query against the temporary relation. However, in order to (effectively) operate in such a fashion, the data access manager must have some other pump mechanism for storing the snapshot and for broadcasting the snapshot to the data access manager. The broadcast database system 30 cannot perform such operations.
As a practical consideration of these two limitations, consider the execution of join operations. The execution time of join operations using the broadcast database system 30 of FIG. 3 depends on the number of tuples in the joined relations and the number off unique join attribute values. Thus, the per tuple query execution time may be greater than the broadcast time. Furthermore, join operations are often intended to form a resultant joined relation against which another query is to be executed. In order to execute such a query, the resultant joined relation must first be transmitted to the pump 32,34,41 of the broadcast database system 30 and stored thereat. This imposes great delays in the execution of such a query. Furthermore, the memory 32 is of a fixed size and cannot store an arbitrary number of temporary relations that would result from an arbitrary number of join operations, and thus presents an impediment to the scalability of the overall query processing.
It is therefore an object of the present invention to overcome the disadvantages of the prior art.