The present invention relates to the field of high performance fault tolerant database systems using off-the-shelf database servers. More particularly, this invention provides non-intrusive non-stop database services for computer applications employing modern database servers.
Data service is an essential service for electronic commerce and information service applications. Among all different data service technologies, such as flat files, indexed files, multiple linked files or databases, database is most preferred. Database servers are complex software systems providing efficient means for data storage, processing and retrieval for multiple concurrent users. Typically, a database server is implemented on top of an existing operating systemxe2x80x94a lower-level software system providing management facilities for efficient use of hardware computing and communication components. The correct execution of a database server relies on the correct execution of the operating system tasks storage systems, computer servers, networking equipment and operating systems. Therefore, failures of any one component can affect the normal operation of database service.
In general, the degree of fault tolerance of a database service depends on how quickly and reliability a database service can detect and recover from a failure or fault. The capability to fault tolerance can be evaluated in two ways: 1) ability to detect a database service failure; and 2) ability to mask a detected service failure in real time. Database service failure can be either hardware failure or software failure. In view of the tight binding among hardware, operating system and database server, it is often difficult to pin point where a failure started. Hardware related fault tolerance technologies use hardware redundancy to prevent component malfunctions. For example, redundant array of inexpensive disks (RAID) technology is well known and functions by replicating hardware disks and distributing replicating data over several disks, using disk mirroring and striping at the hardware level to increase the availability of data and to prevent data storage failures. Hardware related failures have become minimal due to the dramatic increase in the reliability of hardware components and the dramatic decrease in costs. Recent statistics show that software related failures account for the majority of database service failures. Hardware replication technologies can resolve hardware related problems, but cannot provide fault resilience to the operating system and database system malfunctions. Usually commercial database applications do not directly interface with hardware, but interface indirectly through the operating system. In general, database malfunctions of commercial database applications are not detected by hardware that will continuously operate irrespective of a database malfunction. There are few hardware-aware multi-processor database systems available that can detect hardware and database malfunctions directly, but they are prohibitively costly. They require extensive changes to database server software. They lack the continuous operation capability due to their inability to repair components while allowing database accesses. They also lack the continuous capability for it is impossible to pause and update one processor while letting users accessing the data through another processor. Examples of these systems include Compaq""s Nonstop-SQL and Stratus Computer""s proprietary database systems.
In a multi-users multiple redundant database servers environment, race conditions are major problems that cause database lockups and data content inconsistency in the multiple database servers. The ability to run multiple independent redundant database servers concurrently without a race condition is referred to herein as instant transaction replication. There are several database fault tolerance approaches to resolve such race conditions.
One method provides non-concurrent transaction replication using high speed replicated servers. U.S. Pat. No. 5,812,751 to Ekrot, et al. discloses a primary server and a monitoring standby server connecting to the same storage system. Once the primary system fails, the standby server will automatically take over the store system and assumes the operation previously performed by the failed primary server. This method also requires extensive changes to a database server software. More importantly, this method allows the data storage system to be the single point to failure. Any software malfunction that leaves any mark on the data storage contents will lead to an unrecoverable database system. A problem with the non-concurrent replication method is that the single point of failure is the shared storage system. As the user access frequency increases, multiple transactions can be lost in the event of the primary database server crash. This method also does not allow xe2x80x9con-line repairxe2x80x9d or continuous operation of the database.
U.S. Pat. No. 5,745,753 to Mosher, Jr. discloses another non-concurrent remote data duplicate facility (RDF) that maintains virtual synchronization of the backup database with the local database. The RDF includes an extractor process executed by the local computer system, a receiver process and a plurality of update processes by the remote system. The extractor process sends sequential numbered audit trail records to the remote receiver process while the application performs online database restructuring in the local computer. The update process of the remote system, based on the incoming audit trail records, performs the same operation on the backup database. The traction manager stores a stop update audit record in the local audit trail where each online database restructuring successfully completes. Both local and remote processes use these sequential numbered audit records to cross check if any database operation is out of order. Acknowledge and redo commands will pass back and forth between the RDF and local databases. A crash of the local computer system can result in incomplete transaction logs for all replication servers rendering the entire system unrecoverable.
A conventional software method for computer system fault tolerance is file system replication or disk mirroring. A file system is the basic building block of any operating system for storage management, such as hard disks. A replicated file system uses a special storage device driver that maintains the identical appearance to user programs so they xe2x80x9cthinkxe2x80x9d the storage device remains the same while it duplicates each disk I/O request to both the local file system as well as remote backup file system. This method is highly transparent to users since it does not require any modification to existing application programs. In the event of the primary system failure, the replicated file system can be used to restart all services. U.S. Pat. No. 5,764,903 to Yu discloses a virtual disk driver used between a primary server and a secondary server which are mirroring over a network. That system uses a disk write request, also handles the control to operating system which in turn invokes the virtual disk driver. The virtual disk driver monitors both primary and secondary disk write operation, the control does not return to the calling application until the disk write is committed to both the primary and secondary disks. This method can protect most desktop applications, such as word processors, spreadsheets and graphics editors, from failure of the most of hardware fault and the operating system error. However it cannot protect the users from database server crashes, because the timing of database server command executor is not maintained.
U.S. Pat. No. 5,781,910 to Gostanian, et al. discloses distributed transaction database replicas. A single database manager employs an atomic commit protocol such as 2 phase commit (2PC) protocol. The basic idea behind 2PC is to determine a unique decision for all replicas with respect to either committing or aborting a transaction and then executing that decision at all replicas. If a single replica is unable to commit, then the transaction must be aborted by all replicas. The ""910 patent further modified the 2PC protocol by designating one database server as the coordinator that is responsible for making and coordinating the commit or abort decision.
Such prior art systems do not resolve a variety of database sever fault recovery problems. For example, assume that there is a transaction containing sequence of record inserts and record deletes to tables with at least one index. This is very common in modern database applications since an indexed table allows faster search of its contents. Each insert or delete in an indexed table requires a re-indexing of the entire table. The file replication method will duplicate every disk update to both local and a remote file system. Assume further that the database server crashes after the database server""s second phase commit but before the completion of the index updates. Some of the indexes will be corrupted rendering all queries dependent on these indexes useless. A file replication system can only replicate the problem.
It would be extremely beneficial to the electronic commerce and information service community to provide a fault tolerance database server system and method to allow automatic detection and recovery of all database software and hardware failures in real time using instant transaction replications. Such a system would satisfy true xe2x80x9czero closing timexe2x80x9d of electronic commerce on a global scale.
The present invention involves the use of a database gateway to replicate database transactions in real time, and that provides automatic error detection and recovery for database service software and hardware failures. This invention also features on-line repair of database servers, that ensures continuous operation of the database services (such as Microsoft SQL Server(copyright), Oracle(copyright), Sybase(copyright), DB2(copyright), Informix(copyright) and others.) The invention can also be used to provide instant database transaction replication to centralized and geographically dispersed homogeneous or heterogeneous database servers.
One embodiment of the invention, multiple database clients connect to a redundant database server group via a database gateway that connects at least two database servers as a pair of mirror servers with the same database contents. Instant transaction replication guarantees all servers are synchronized in data contents. In such a system, every server is a backup to any other server in the same group. Therefore the reliability of the overall system increases proportionally as the number of redundant database servers increases. The database gateway can also be protected from its own failure by using a slave database gateway that monitors and protects the master database gateway in real time.
The basic principle of the present invention is replicating a database queries instead of just doing storage replication, since the majority of database queries are shorter than the storage updates generated by the query, the amount of the replicated data using this invention is greatly reduced. In both high and low bandwidth environments, the invention continuously monitors and intercepts the database inquiries between communication links and the database servers. It can redirect a communication link to a healthy server, if the connected server or the communication link reaches a designated failure level. This re-direction happens at the communication data packet level resulting in a repaired client-to-server connection data link in real time. The communication link interception and repair are completely transparent to the user applications. Since the database gateway does not keep the actual data, the hardware requirements of the database gateway is less demanding then that for a database server. A database gateway can be built using 100% off-the-shelf commodity components.
One object of the present invention is taking advantage of the concurrent processing of multiple database servers and multiple independent networks to balance the communication load among database clients and the servers, for instant database service failure detection and recovery including both hardware and software related malfunctions. The prior art does not address both malfunctions with high efficiency and reasonable cost.
It is another object of the present invention to provide performance enhancement by eliminating unnecessary waiting times due to the use of multiple redundant database servers shown in prior arts. The performance of the fault tolerant database service of the present invention is comparable to a non-fault tolerant database server where the clients connect directly to the database server without the database gateway.
Another object of the present invention uses algorithms to minimize the duplication client data and reduce the overhead of achieving fault tolerance processing in parallel. Therefore it is suitable for both high and low bandwidth networking environments.
It is another object of the present invention to isolate database clients from direct access to the physical data. Thus it allows adding security features, such as data access control and encryption, for the existing database system without changing a running database environment. This allows greater extensibility of a working database system to include clients and database servers connected only through the public networks, such as the Internet.