Certain terms used in the “Background of the Invention” are defined in the section “Definitions.”
Computer Applications
Much of our daily lives is augmented by computers. The many services upon which we depend, our banking, investing, communications, air and rail travel, online shopping, credit-card and debit-card purchases, mail and package delivery, and electric-power distribution, are all managed by computer applications.
In its simplest form, a computer application is implemented as a computer program running in a computer. A computer program is basically a set of computer-encoded instructions. It often is called an executable because it can be executed by a computer. A computer program running in a computer is called a process, and each process has a unique identifier known to the computer. Many copies of the same (or different) computer program can be running in a computer as separately distinguishable processes. A computer program can utilize multiple processes.
An application typically includes multiple interacting processes.
Application Database
An application often depends upon a database of information that the application maintains to record its current state. Frequently, the information in the database is fundamental to the operation of the application, to the decisions it makes, and to its delivery of services to the end users.
The database may be stored in persistent storage such as a disk for durability, it may be stored in high-speed memory for performance, or it may use a combination of these storage techniques. The database may be resident in the same computer as the application program, it may be resident in another computer, it may be implemented as an independent system, or it may be distributed among many systems.
A database generally includes one or more files or tables. Each file or table typically represents an entity set such as “employees” or “credit cards.” A file is comprised of records, each depicting an entity-set member such as an employee. A table is comprised of rows that define members of an entity set. A record is comprised of fields that describe entity-set attributes, such as salary. A row is comprised of columns that depict attributes of the entity set. In this specification, “files” are equivalent to “tables;” “records” are equivalent to “rows;” and “fields” are equivalent to “columns.”
Requests
End users generate requests to be processed by the computer application. End users may be people, other computer applications, other computer systems, or electronic devices such as electric power meters. In this specification, the term “end user” means any entity that can influence an application and/or can request or use the services that the application provides.
An example of a request from an end user is a request for a bank-account balance. Another example is an alert that a circuit breaker in a power substation has just tripped. In some cases, a computer application may on its own internally generate events for interfacing with itself or other applications (and thus be its own end user).
Request Processing
The application receives a request from an end user. As part of the processing of this request, the application may make certain modifications to its database.
The application can read the contents of its database. As part of the application's processing of the request, it may read certain information from its database to make decisions. Based on the request received from its incoming end user and the data in its database, the application delivers certain services to its outgoing end users.
Services
A service may be delivered by an application program to process requests as the result of a specific input from an end user, such as providing an account balance in response to an online banking query. Another example of a service is the generation of a report upon a request from an end user.
Alternatively, the application program may deliver a service spontaneously, either on a timed basis or when certain conditions occur. For instance, a report may be generated periodically.
The end users providing the input to the application may or may not be the same end users as those that receive its services.
Transactions
The services provided by the application typically are processed as transactions. Each transaction will have a beginning point (for example, when a transaction is started), and an end point (for example, when the transaction completes, either successfully or unsuccessfully). A successful transaction is referred to as one that commits (completes successfully) or is committed. Its effects remain after the transaction ends. An unsuccessful transaction is referred to as one that aborts or has been aborted, and its effects are removed and reset to the original state of the application environment.
The transaction thereby groups the associated operations, functions, data changes, etc., into a logical set of processing functions and changes that are either all applied (committed) or all removed (aborted) depending on the ultimate status of that transaction.
Hence transactions typically follow the ACID properties—atomicity, consistency, isolation, and durability.
RAS—Reliability, Availability, and Scalability
The purpose of the variety of processing architectures in use today is to enhance the attributes known as RAS—Reliability, Availability, and Scalability. By reliability, we mean data integrity. The data in databases must remain correct and consistent. Any transaction applied to the database typically must leave it in a correct, consistent state.
Availability means that the system is always ready for use by the end users. A typical server has an availability of four 9s. This means that it will be down approximately 50 minutes per year. System availability can be enhanced significantly by running a pair of servers in an active/active configuration (described later). Typical availabilities for active/active systems are about six 9s, which equates to about 30 seconds per year of downtime.
Scalability is the capacity to add resources to handle additional transaction loads. When the loads decrease, the additional processing resources are typically released.
Active/Active Architecture Systems
Background for active/active architecture systems (“Active/Active Systems”) is described in Volume 2 of the book series “Breaking the Availability Barrier (Breaking the Availability Barrier II: Achieving Century Uptimes with Active/Active Systems, AuthorHouse; 2007), and in U.S. Pat. No. 6,662,196 (Holenstein et al.) and U.S. Pat. No. 7,103,586 (Holenstein et al.). An active/active system, shown in FIG. 1, is comprised of two or more independent systems in a redundant application network that are cooperating in a common application. A transaction can be sent to any system in the network to be properly processed. The systems are independently processing different transactions. Changes made to the database of one system by a transaction are replicated to the databases of the other systems in the application network to keep the databases synchronized.
All capacity is available for use. Only a portion of the users are affected should a node in the system fail. Their transactions can be simply rerouted to a surviving node (system). Thus, recovery from a failure is measured in subseconds or seconds.
Validation Architecture Systems
The validation architecture system shown in FIG. 2 and described further in U.S. Pat. No. 9,734,190 (Holenstein et al.) and U.S. Pat. No. 9,922,074 (Hoffmann et al.), also incorporates two systems. However, in this case, each system is processing the same transaction. A Transaction Distributor sends the request to process a transaction to both systems. Each system calculates an indicium of some sort representing the result of its processing. For instance, the indicium could be a unique hash of the changes made by the system to its database.
The indicia calculated by the two systems are compared by each system. If they match, the transaction is committed. If they don't match, the transaction is aborted. In this context, “match” may be an identical match, but it can also encompass forms of fuzzy or intelligent inexact matching. One example of fuzzy matching is if the two systems use different rounding algorithms on a calculation or have different floating point implementations. The inexact match could then consist of a tolerance such as the match is accepted if the difference is within one thousandth of a percent. Fuzzy matching could also involve utilizing only a subset of the columns affected in the transaction.
The benefit of a validation architecture is that it detects all single-system errors, and many multiple-system errors, such as hardware/software failures or malware.
The architecture shown in FIG. 2 is a Dual Server Reliability (DSR) configuration. A Triple Server Reliability (TSR) configuration is shown in FIG. 3. All systems process the same transaction, and their indicia are compared. If all indicia match, the transaction is committed. If only two indicia match, the transaction is committed on those two systems; and the third system can be taken out of service or have corrective action taken. An error indication can be posted for manual resolution of the problem if necessary.
Comparing the Two Architectures
A comparison of active/active systems and validation architectures is shown in Table 1 and FIG. 4 and FIG. 5. As shown in FIG. 4, an active/active system has high data availability, but a corruption in the database may go undetected and will be replicated to the other databases in the application network impacting data reliability. Replication may be synchronous or asynchronous.
In the case of a validation architecture, data availability is also high but no single hardware failure, software error, malware, or operator error can affect the data integrity (reliability) of the system unknown to the owner as the indicia of the two systems will no longer match. In this case, corrective action must be taken on the validation architecture system such as taking one or more nodes of the system out of service to be repaired.
Comparing availability, a node failure in an async active/active system may allow the system to continue operating with just the surviving systems, though with lower capacity. But, in the case of a DSR validation architecture, the result of a node outage is either 0% capacity or 100% capacity depending upon the decision to continue processing with one node or not.
As shown in FIG. 5, active/active systems are scalable—the more nodes in the system, the more capacity to process transactions. A validation architecture is not readily scalable. It has the capacity of a single node.
Active/active systems are ideal for use in private data centers. The validation architecture is ideal in untrusted or unreliable environments such as public clouds.
TABLE 1A Comparison of Active/Active and Validation ArchitecturesReliabilityAvailabilityScalabilityUsageActive/Active1. Single System1. Node outage -Multiple nodesIdeal inArchitectureIntegrity50% of usersprocessingcorporate data2. Hypothetical:affecteddifferentcenters whereHardware - six 9s2. Usertransactionsavailability andSoftware - five 9sSwitchover -scalability arefive 9scriticalValidationNo single hardware,1. DSR - Node1. 100% of a1. Ideal inArchitecturesoftware, malware,outage is eithersingle nodeuntrusted oroperator error, etc.0% or 100%2. No scalabilityunreliablecan affect integrity(continuousfor more nodesenvironments.unknown to systemprocessing or do(no control ofownernot continue)hardware)2. TSR -2. For high valuecontinuoustransactions (likeprocessingin banking)What is Needed
What is needed is a system and method that combines the best features of Active/Active and DSR/TSR Validation Architectures into a mixed-mode architecture that optimizes application reliability, availability, and scalability.
As discussed above, Active/Active and Validation Architectures are both prior art methods. Combining the technologies in novel ways is needed to maximize RAS and is the basis of preferred embodiments of the present invention.