Data management is easy if components of the data management system are in a static configuration and if there is a centralized monitor. Data management is not easy at all if there is no centralized monitor and if there is no enforcement of a static configuration of the components.
In the era of the Internet, application areas such as business-to-business and business-to-consumer electronic commerce are important for information systems, as well as for economics. Essential topics in this context are, among others, information retrieval (search engines of all kinds), information theory (cryptography and payment protocols), and semistructured data (XML). All these technologies try to facilitate the way in which distributed systems can co-operate across networks in general and the Internet in particular. It is particularly fruitful to address “transactions,” or ways to allow multiple users to manipulate data concurrently. It is instructive to describe systems which are called “composite systems.” In a composite system, a collection of different, autonomous information systems interact transactionally. As it turns out, existing solutions in this area are far from ideal and are based on assumptions that might no longer be valid in many present-day systems.
In a composite system, there is a hierarchy of invocation calls between different services across a variety of components. In a typical system, different and independent servers (from different organizations) invoke each others' services to accomplish an e-commerce transaction. For instance, buying a complex product involves retrieving the necessary parts to assemble it, as well as planning the assembly and arranging the shipment procedure. Each of these activities is a service offered at a distinct server in the distributed system. For instance, checking the stock for availability of each part is done in an Inventory control system. There, lack of availability is translated into yet another invocation of a third party server, namely a supply e-commerce interface. Yet another invocation arises because (in a typical system) customers are allowed to trace the status of their orders. Through a manufacturing control system which is yet another server at another location, queries concerning the order status can be issued, and again translated into delegated calls somewhere else. Thus, in principle each component is implemented as an independent entity residing in a different location (over a LAN or a WAN as, e.g., the electronic commerce interface). These components invoke the services provided by other components (FIG. 1) forming an arbitrary nested client-server hierarchy in which increasing levels of abstraction and functionality can be introduced (FIG. 2). As a matter of terminology, the most important aspects of each component are the application logic layer (a server) and a resource manager (usually a database), that is accessed by the former.
The challenge is to design and implement an inherently decentralized and advanced transactional mechanism. This mechanism should allow combining such components in any possible configuration, so that transactions can be executed across the resulting system guaranteeing correct (transactionally correct) results, even if the configuration is dynamically altered. It is crucial for these components to remain independent of each other, that is, there should not be (the need for) a centralized component controlling their behavior. Additionally, nested transactions should be supported, because of the inherent distributed nature, leaving room for alternatives on failure of a particular remote call. For instance, if a particular e-commerce interface is down, then another supplier can be tried.
To see why these characteristics are important, it suffices to look at the Internet: servers may be unreachable, new servers appear without notice, and the nature of the Internet itself already excludes the possibility of relying on a fixed configuration between the different systems that co-operate. For these same reasons, such a system should also be able to cope with failing calls in a flexible and elegant way: if a server cannot be reached at a given moment, it should be possible to try an alternative service. One of the golden rules in electronic commerce is to maximize customer satisfaction by minimizing the number of service denials. This suggests that remote failures be dealt with in the service itself, without making them visible to the customer invoking the service. Finally, there is no reason why different remote service invocations within the same task should not be executed in parallel (where it is possible to do so), thereby minimizing response time. However, as will become clear in the discussion that follows, hardly any of these desirable properties is feasible with existing solutions.
Flat transactions. The transactional model for (distributed) computing has been around for many years and it is considered a well-established and mature technology. The basis of the classical transaction theory are the four ACID properties (atomicity, consistency, isolation and durability) that define a computation as being transactional. Within the scope of this work, the influence of distribution and autonomy on isolation and atomicity will be of particular interest.
Although there is nothing fundamentally wrong with ACID-ity, most of the transactional technology in use today has been developed before the Internet grew into what it is today. Consequently, in today's transaction systems, the quality of being “distributed” appears as an “add-on” and not as an inherent feature. The term “flat transactions” refers to the fact that conventional transactions have no internal structure (no logical subparts) and that atomicity is implemented by aborting everything as soon as one operation fails. While this is fine for centralized systems, this model is not suitable for distributed architectures. In a distributed system, almost by definition, tasks have an internal structure where each remote call is a separate logic component. For instance, in case of remote failure, another node (or network route) could be used to solve the task. Thus, it seems inadequate to abort everything in a distributed computation simply because one server appears to be down at a given moment. Yet this is what flat transaction technology implies, thereby losing a big opportunity to improve services by exploiting properties of distributed architectures.
Flat transactions and distribution. The earliest applications of flat transactions were in the field of databases. Distributed transactions, along with distributed databases, were not seriously considered until the 1980's and early 1990's. Not surprisingly, this more or less coincides with the earlier stages of the global network. At that time, a lot of attention was devoted to various kinds of concurrency control techniques (such as strict two-phase locking and timestamps, to name a few) and how they could be reconciled with a distributed transactions system, i.e., in which a distributed or global transaction may have local transactions on multiple sites (although it was common to assume that each distributed transaction would have at most one local transaction on a given site).
One of the important conclusions of these research activities was that it suffices to have strict two-phase locking and two-phase commit on each node of a distributed database system to provide correct isolation and atomicity. Because strict two-phase locking already was, and still is, the basic technique that virtually every database system used for enforcing isolation, there has been no fundamental change in technology. Even today the flat transaction is pervasive, and systems have been enriched with the two-phase commit protocol to make them work in distributed environments. Thus, much of the work on distributed transactions and advanced transaction models in this context (focusing on techniques other than locking and how to avoid two-phase commit) turned out to be practically irrelevant.
A lot of work already exists concerning distributed commitment. An important theoretical fact is the impossibility of non-blocking consensus in asynchronous systems with communication or node failures. The prevailing protocol with acceptable message overhead has proven to be two-phase commit. Other protocols exist, such as three-phase commit, which tries to avoid blocking. However, it is more expensive in terms of messages exchanged and blocking is only avoided when no communication failures occur (which makes it impractical and expensive). In the two-phase commit protocol, distributed consensus is reached by two message rounds, under the supervision of a coordinator. In the first round, the coordinator asks each of the participating nodes whether it can agree with an eventual commit outcome. If the participant detects no local problem on behalf of the transaction, it votes yes, thereby giving up any local right to abort unilaterally (leaving the participant in the so-called in-doubt state). Otherwise, the vote will be no. The coordinator collects all votes, and only if all participants vote yes it will decide on global commit and, as the second round, send commit messages to everyone. If at least one participant did not reply with a yes vote (either because of timeout or because a “no” was received), then the coordinator decides on global abort and notifies, as the second round, any in-doubt participants. Of course, all this has to be done with the proper amount of logging (to survive crashes). When and how this logging is done, and how to use it during recovery is the main difference between the many variants that have been proposed.
Delicate problems arise in case nodes fail, especially if the coordinator fails while some nodes are in-doubt. This is the so-called blocking time window of the protocol, which is preferably kept as small as possible. Nevertheless, this window exists, and it is something that one has to live with. Within this context, the resulting architecture for distributed transactions is shown in FIG. 4. A number of RDBMS (DB1; DB2; DB3 in the example), also called resource managers in the literature, are subject to the coordination of a central transaction monitor (TM in the illustration). This TM is responsible for creating the transaction identifier and coordinating the two-phase commit protocol. A RDBMS is usually never invoked directly: instead, a server process is invoked (such as server1; server2; server3 in the example), and this process accesses the data. The reason for this is load balancing: by adding more server processes on more CPUs, the system can better distribute the load. Server processes can invoke each other (re-use each other's logic) as well as directly access their local RDBMS on behalf of some transaction with transaction identifier T. The identifier T is part of all communications within the system: both inter-server calls and RDBMS calls are aware of which T is being executed. For isolation, each RDBMS inspects incoming requests and sets locks based on the transaction identifier. This is where two-phase locking is enforced. Because there is only one transaction identifier for every transaction, different intra-transaction accesses to the same data will be allowed by every RDBMS involved. For atomicity, the transaction monitor runs a two-phase commit protocol between all resources involved, which again uses the transaction identifier to keep track of which operations to commit. Note that the transaction monitor is the only entity that knows the resources that are part of one given transaction.
Such a prior-art architecture does not favor decentralization. One of the aims of the invention is to eliminate the central role the transaction monitor plays. In the above example, there was only one transaction monitor process involved. As long as this assumption holds, and each server knows what other servers do (this point will be clarified below in a discussion of recursive client-server relationships), no serious anomalies arise if a distributed at transaction is used: in an ideal situation, with no failures and no concurrency, every transaction can be executed and will be correct. However, when large-scale distribution is considered, it is not realistic to assume a central coordinating entity that manages transactions: if multiple information systems interact transactionally, then more than one transaction monitor will be involved, an example of which is shown in FIG. 5. Because each transaction monitor works with its own policies for determining a transaction identifier, a global distributed transaction will have multiple and possibly different identities in each system. In FIG. 5, three organizations interact, each of them with their own transaction monitor (TMA; TMB; TMC in the example). Due to the different policies for identifiers, a mapping has to be performed when invocations cross organizational (and therefore transaction monitor) boundaries. In practice, one can consider two possibilities: the push model and the pull model, depending on where the mapping is maintained (on the caller side or on the callee side). In the particular case of the Internet, there have been some recent efforts to define a so-called transactional internet protocol (TIP) standard for doing this type of mapping. Nevertheless, irrespective of where or how it is done, information is lost in the process. For instance, FIG. 5 clearly shows that if a client invocation reaches serverC through the domains of two different transaction managers (TMA; TMB) then the two invocations of the same global transaction will be known to TMC as two different local transactions T2 and T4. If both calls need to access the same data, the resource manager will block one of them, thereby deadlocking the execution. The other option is that all transaction managers in the system use the same identifier for the work of the same client, but this is not usually how it works out.
This subtle but very limiting feature of current technology is due to the fact that existing systems are not required to recognize different parts of the same distributed transaction as one unit. Consequently, strict two-phase locking will treat them as different transactions and block one call accordingly. In distributed transaction processing, this problem is also known as the “diamond problem.” One might argue that diamond problems are probably very rare, since they only happen on common data access through different invocation hierarchies. However, by definition, these accesses are done on behalf of the same client and therefore much more likely to happen in practice, simply because the different calls share the same context.
Robustness. Another problem with flat transactions and distribution concerns intra-transaction protection against remote failures. Indeed, aborting everything as soon as one operation fails may make sense for centralized databases, but on Internet systems failures do not merely depend on local effects, but also on the availability of remote sites. For example, something that is quite harmless to other Internet processes, such as momentary congestion in a data link, may lead to a more serious problem where two-phase commits (with possible timeouts) are being used. This suggests that a more robust model be used such as, for instance, a nested transaction model. In nested transactions a remote failure does not restrict locally started transactions from completing because the failure can be detected at execution time and one of a number of provided alternatives can be tried. One known commercial transaction monitor that provides nested transactions is a product called “Encina.” Otherwise, nested transactions remain a theoretical curiosity.
Parallelism. The third and last problem with existing transactions is their restriction to serial execution within one transaction: multithreaded transactional applications are not allowed. This seems to be overly restrictive, especially when considering remote transactional calls on the Internet: if two entirely different sites are to be called, then there is no obvious reason why this should not be done in parallel. Although it would probably be possible to incorporate this feature into the flat model, it appears as a natural extension as soon as one moves into nested transaction models, as discussed below.
Nested transaction models. So far discussion has concentrated on the classical flat transaction model. The reason for this is that virtually no existing product or system will use anything else. More advanced and more elegant models exist, however. It must be kept in mind, however, that most of these concepts have never been implemented. Relevant to the subject are the different paradigms of nested transactions. There are many known variants, and a brief review of each of them now follows.
General characteristics of nested transactions. The term “nested” refers to the fact that a transaction can be (recursively) decomposed into subtransactions, parts that form a logically related subtask. In this way, a parent transaction can have multiple children, each child being a subtransaction. A key point is that a successful subtransaction only becomes permanent (i.e., committed) if all its ancestors succeeded as well, whereas the inverse does not hold: if a child fails, the parent is free to try an alternative task, thereby rescuing the global work. The advantages of doing this are twofold: firstly, a failure of a subtransaction clearly delimits the scope of the failed part, allowing a clear definition of alternative policies. Secondly, subtransactions define clear boundaries of isolation among parts of the same overall task. This can be exploited by the system to allow parallelism inside one and the same global transaction.
Closed nested transactions. In the closed nested paradigm, locks are acquired as in two-phase locking, but extra policies determine the behavior of subtransactions with respect to each other. More precisely, as soon as a subtransaction finishes, its locks are passed on to its parent. A child of that very same parent will inherit these locks whenever it needs access to the same data. Without this feature, children of the same parent could block each other, resulting in very impractical systems. It is this very characteristic that so far has made closed nested transactions unfit for practical use: no existing RDBMS supports lock inheritance. Indeed, implementing lock inheritance is difficult and expensive, even in a centralized system. The fact that distribution comes into play (that is, that systems nowadays are “distributed”) makes it even more complicated. Practical evidence for this fact can be seen in Encina, the only existing product using nested transactions: upon configuring a server, it is necessary to choose a mapping mode that determines how different subtransactions are mapped to the underlying database transactions, thereby determining whether lock inheritance can be simulated or not. Indeed, there are essentially two policies:
1. Two subtransactions of a common parent transaction are mapped to different database transactions.
2. Two subtransactions of a common parent transaction are mapped to the same underlying database transaction.
In the first case conflicting subtransactions will block each other, which is the equivalent of no lock inheritance. In the second case, there is no isolation among parallel subtransactions. Furthermore, this mapping is implemented as a setup choice and cannot be changed dynamically based on the client's needs.
Open nested transactions. Open nested transactions differ from the closed variant in that the locks of a subtransaction are released as soon as that subtransaction is finished (either released entirely or replaced by a semantic lock, depending on the variant). If locks are released entirely, then there is hardly any guarantee about isolation nor about atomicity of the global transaction. When openness is introduced, practical systems (based on a commercial RDBMS) will have to use compensating tasks. These are tasks that reverse the effects of a given task, after that task has released its locks and has committed. This is necessary because in current databases the only way to instruct a RDBMS to release locks is by committing the transaction. In order to make a compensation correct (so that it really reverses all updates correctly), certain restrictions must be imposed. Therefore, in most cases, some kind of higher-level semantic lock has to be maintained until compensation may no longer happen (when the transaction has been terminated at all sites). As a simple example, consider the following: a bank has an open nested system in charge of executing transfers between different accounts. Suppose that a general policy rule states that no account should be allowed to have a negative balance. Transferring money from one account (A) of bank BankA to a different and empty account (B, in another bank BankB) consists of two steps:
1. The amount Am to be transferred is added to account B. This is implemented as an open subtransaction of the transfer operation, and is immediately committed in BankB. In this way, the new balance is exposed to concurrent activities.
2. Next, the same amount is taken from account A. However, let us assume that, due to a communication failure, this step fails.
3. To cancel the entire operation, the amount Am is withdrawn again from account B. In isolated circumstances this system works fine, but not if different and concurrent activities are going on. Indeed, it is easy to see that if the owner of account B withdraws money between steps 1 and 3, the final balance of his account might be end up negative. Therefore, in this case, a lock should prevent any such withdrawals as long as step 3 may still be necessary. To date, no existing implementation of open nested transactions is readily available.
Multilevel transactions. This is a variant of open nested transactions, where the transaction structures are perfectly balanced trees, all of the same depth. This allows the execution to be decomposed into layers or levels. The prime emphasis was not so much on distribution and autonomy, but on elegance and composition of primitive operations into more complex ones. The principles of multilevel transactions can be stated in three rules:
1. abstraction hierarchy: a hierarchy of objects exists, along with their operations.
2. layered abstraction: objects of layer N are completely implemented by using operations of layer N−1.
3. discipline: there are no shortcuts from layer N to layers lower than N−1.
Just as in open nested transactions, multilevel transactions rely on the existence of a compensation for each operation on any layer. Moreover, the compensations on layer N−1 are scheduled by layer N or higher, which introduces a recovery dependency across layers. These facts underline the dependence on a central system, or on a clearly structured and “trusted” (in the sense of a reliable client layer) federation rather than autonomous and arbitrary distribution. Although originally proposed as a model for federated databases as well, the layered approach and the recovery dependency make this paradigm less favorable for the more general case of composite systems.
Existing industrial efforts for Internet transaction processing. There are a few existing approaches concerning transaction management for Internet architectures. These include the following.
Enterprise Java Beans. This is the Java vision for distributed transaction processing applications. Enterprise Java Beans is a standard, meaning that it consists of specifications rather than implementations. The main objective is to provide a portable way of writing transactional applications. By taking all server-specific issues out of the application (such as transaction management, pooling of resources, swapping of inactive components), it is possible to create portable applications (so-called Beans, the Java terminology for a software component). The key idea is that all these components have to adhere to a standardized way of interacting with the server environment. In practice, this means that a component has a set of predefined methods that are called by the server in case of important events. For instance, before swapping out an inactive component, this component is notified by calling its method ejbPassivate( ) whose implementation should discard any volatile data and synchronize the component's database state. The whole concept of this technology is thus oriented towards component-based server applications, and the contract between a component and the server can be very complex. As such, it is orthogonal to the objectives discussed here: although EJB mainly targets applications with transactional aspects, the issue of transaction management itself is left open. In that approach, some of the problems with distributed transactions are recognized, but no attempts are made to solve them. Finally, with JavaBeans, nested transactions are not currently supported.
CORBA Object Transaction Service. As part of the global CORBA standard, the OTS specification deals with transactions in CORBA environments. The objective is to standardize the way in which distributed objects can interact with a transaction manager, and how different transaction managers can communicate with each other. However, it is not specified how transaction management can be done. Rather, the interfaces between application and transaction manager, between transaction manager and resources (databases) and between different transaction managers are the main scope of this standard. Nested transactions are optional, and the interfaces exist. However, the internal aspect of how this transaction management should or could be done is left open. Only one ORB is known that incorporates nested transactions: Orbix' OTM, whose functionality is based on the above-mentioned Encina. It should be mentioned that the OTS standard does not properly address communication failures; although efforts to address such failures have been made, those efforts can be shown to be insufficient.
Transaction Internet Protocol (TIP). The transactional internet protocol (TIP) is another industrial standardization effort dealing with standardizing two-phase commit over TCP/IP networks. As such it specifies how different transaction monitor instances could co-ordinate a transaction's two-phase commit outcome by using a character stream connection. The effort is implicitly oriented towards at transactions (which is reflected in the specification's protocols and interfaces) and, in most cases, towards point-to-point interactions rather than multiple accesses through different network paths. As such, it is not sufficient for the type of composite systems addressed herein.
Returning again to some of the problems requiring solution, it is a well-known fact that distributed agreement (of which two-phase commit is an example) can always lead to blocking if both node failures and communication failures are possible. In those cases, parts of a distributed system are left in an indecisive (in-doubt) state. For data sources, this implies that locks have to be kept indefinitely, leading to serious availability problems.
It would be desirable to have a system with exactly (open) nested transactions, decentralization, communication failure tolerance, and avoidance of diamond cases. It would be extremely desirable to have a system in which are provided completely autonomous components that, without any centralized coordination, and interact transactionally. It would be desirable to have a system in which components can be combined in any configuration and can be dynamically added or removed without compromising correctness. It would be desirable for such a system not to require a very large infrastructure. Finally, it would be desirable for such a system to have performance at least comparable to the performance of prior-art systems.