The ability to share information easily and in a timely fashion is a crucial requirement for any business environment. Consequently, information sharing has been supported by many mechanisms, such as discussions, mail, books, periodicals, and computer technology. Many computer-based technologies have evolved to promote the goal of information sharing, such as reports/statements, replication and messaging.
In a system that uses messaging, messages containing information about events are propagated through a network of nodes. Typically, the events involve activities that create or modify data, and messages transmitted to the network of nodes contain data about the information created or modified.
A network that uses messaging may be represented by directed graphs of nodes. Edges join the nodes, each edge representing a flow of messages from a “source” node to another adjacent “destination” node. For a given node, multiple edges can emanate from the node or terminate at the node. In addition, there can be cycles, representing messages flowing from a source node back to the source node along a path that may include one or more other nodes.
For the events that occur at a node, messages about the events are sent to other nodes. A node that receives a message performs some action in response to receiving a message, actions such as updating data stored on the node, and forwarding the message to other nodes. Furthermore, for a given message, a node may send the message to some of the connected nodes but not others.
The message flow needed from one system to another system may differ. Various types of messaging systems provide the ability for users to configure the message flow between nodes in a network. One type of messaging system, a rule-based messaging system, allows a user to specify rules that govern the flow of messages.
A rule specifies a condition and an action to perform if the condition is met. In general, rules comply with a rules language, which is like a computer language. Messaging systems that use rules expose information about events through variables or attributes that can be referenced by the rules. The condition in a rule may be expressed using boolean expressions that reference the variables and attributes. The rules may be used to select which events for which messages are sent to other nodes, and what to do with a message received from another node.
A context in which messaging systems are implemented are database systems. In such a messaging system, each node in the system is a database server and the events for which messages are propagated correspond to data manipulation changes (“DML changes”) and data definition changes (“DDL changes”) made by a database server. According to an implementation of a rule-based messaging system used for a database system, a capture process scans a redo log for DML or DDL changes and stages them into a queue for subsequent propagation to other database servers. The queued messages are captured and propagated to other database servers according to rules provided by the user.
The term “capture” or “capture events” is used herein to refer to selecting events and adding messages to a staging area for subsequent processing. A capture process captures events.
The phrase “propagate a message” is used herein to refer to the activity of distributing messages generated by a capture process to other nodes for subsequent processing. Message propagation may entail distributing messages from a staging area for one node to another staging area for subsequent processing at another node. A propagation process propagates messages. Propagating an event refers to propagating a message about the event.
The following is an illustration of a rule-based message system implementation for a database system. The illustrative messaging system includes three nodes, database server New York, database server Pittsburgh, and database server Chicago. Database server New York has a table sales. DML and DDL changes (e.g. updates, deletes, and inserts) made by database server New York are propagated via illustrative messaging system to database servers Pittsburgh and Chicago so that changes may be reflected in their respective tables sales (not shown).
Database servers Pittsburgh and Chicago do not receive all DDL and DML changes made to table sales in database server New York. The particular changes propagated to database servers Pittsburgh and Chicago depend on the value of a column city (not shown). Changes to rows where the value of city equals ‘Pittsburgh’ are propagated to database server Pittsburgh; changes to rows where the value of city equals ‘Chicago’ are propagated to database server Chicago.
A capture process NY captures changes to table sales at database server New York and, for each change, adds a message to a message queue NY. Capture process NY captures the changes by scanning a redo log (not shown) maintained by database server New York. A redo log contains records that specify changes to rows in tables maintained by database server New York. Capture process NY scans the redo log for records specifying changes to rows in table sales, adding a message to message queue NY for changes to a particular row.
A propagate process PGH propagates messages queued in message queue NY to database server Pittsburgh and a propagate process CHI propagates messages queued in message queue NY to database server Chicago. Messages reflecting changes to a row having city value equal to ‘Pittsburgh’ are propagated to database server Pittsburgh. Messages reflecting changes to a row having a city value equal to ‘Chicago’ are propagated to database server Chicago.
Rules are used to decide what changes to capture and what messages to propagate. To determine what changes to what particular rows are captured, capture process NY executes capture rules NY. The condition for a rule in capture rules NY is expressed as a SQL expression that can reference a variable specifying the “source table” affected by a change or a column of the table. The following predicate expression is used in a rule to capture changes to source table sales.source_table=“sales”
Propagate processes PGH and CHI evaluate propagate rules PGH and CHI, respectively, to determine how to propagate a message from message queue NY. A rule condition that specifies to propagate to database server Pittsburgh messages reflecting changes to a row with city value equal to ‘Pittsburgh’ may be expressed using the following predicate expression.source_table=“sales” AND city=“Pittsburgh”
While rules offer greater flexibility by allowing users to control message flow using user-supplied rules, rule-based messaging systems have many drawbacks. First among the drawbacks is that in many situations the set of rules needed to define a desired message flow can become very complicated and burdensome to develop and maintain. For example, it is often desirable to capture all changes to rows in table sales with various exceptions. Specifically, changes to rows with city equal to ‘Pittsburgh’ are captured with the exception of rows for particular customers. A column customer in table sales contains values representing customers. The typical approach for using rules to propagate messages for this situation is to generate a rule for each customer. If there are a lot of customers, then a lot of rules would need to be implemented. Furthermore, whenever rows for a new customer are inserted into table sales, and it is desired that changes for that customer be propagated, then another rule needs to be added. Over the course of time, many new rules would have to be added to support additional customers. The burden of maintaining rules to accommodate new customers can be onerous.
Another drawback is that propagating DML and DDL changes to other database servers may require a kind of action that cannot be effected using rules supported by a conventional messaging system. One type of action is transformation. Transformation refers to the process of transforming data in a message. Transformation is commonly used to transform data from a format used by a database server to another format used by another database sever. For example, a date column in table sales in Chicago may correspond to a day and time column in table sales in Pittsburgh. To propagate column changes to the date column from database server Chicago to the day and time column at database server Pittsburgh requires transforming a single value into the date column into two values for the day and time column. Conventional rule-based messaging systems provide no mechanism for specifying message transformation operations.
Another kind of action not supported by conventional rule-based systems is row migration operation. Row migration refers to a process for converting an update operation in the source database server to an insert or delete operation on a database server. Row migration is needed when data in a table on the source database system is “partitioned” between target database servers, that is some of the data in the table is replicated on one database server while another set of data is replicated on another database server. Some DML changes to the table are propagated to one database server while other DML changes to the table are propagated to another. An example of partitioned data is table sales on database server Chicago.
To demonstrate the need for row migration, the following illustration is provided. A row in table sales, with column city equal to ‘Pittsburgh’, has been propagated to database server Pittsburgh but not database server Chicago. As a result, there is a corresponding row in table sales on database server Pittsburgh but not database server Chicago. If on database server New York the value in column city for the row is updated from ‘Pittsburgh’ to ‘Chicago’, then to propagate the change properly the corresponding row at database server Pittsburgh should be deleted while a corresponding row in database server Chicago must be inserted. Row migration operations are not currently supported by conventional rule-based messaging systems.
Finally, another drawback to rules-based messaging systems is that rules are relatively expensive to execute in terms of computer resources, (e.g. CPU processing) as compared to executing other code written using other computer languages (e.g. machine code). For messaging systems that use a lot of rules, the heavy expense of executing the rules is compounded.
Based on the foregoing, it is clearly desirable to provide approaches that alleviate the complexity and burden of developing and maintaining rules in a rules based messaging system, that allow rules to be used to specify kinds of actions that are not supported by conventional rules-based systems, and that provide more efficient ways of executing rules.