The subject of this application relates generally to the field of operating systems and, more particularly, to fault-tolerant computer systems and methods utilizing single or multiple processors.
As our reliance on the Internet and in general computing resources increases, it becomes imperative to provide uninterruptible computer services to computer users. One way to ensure uninterruptible service is to provide hardware replication to avoid problems associated with hardware failure.
A common hardware utilized in provision of computer services is a central processing unit (CPU). CPUs are continuously becoming more powerful than other parts of a computer system (such as memory). Currently, most CPUs spend a lot of time waiting for memory and other interfaces. To provide a more efficient utilization of processing resources, a technique called multithreading is quickly becoming more prevalent in the industry.
Multithreading enables multitasking within a single program. It allows multiple streams (or threads) of execution to take place concurrently within the same program. Each thread may process a different transaction. In order for a multithreaded program to be of any value, it must be run in a multitasking or multiprocessing environment, which allows multiple operations to take place at the same time. The real performance advantage of multithreading becomes apparent where one of the threads is held up waiting for data to arrive and the other threads can continue running. This efficiency alone can speed up today""s database and web server systems three to five-fold. In off-the-shelf multi threading (offering operating) system packages (such as Windows NT, Windows 2000, Solaris, and alike), multiple threads may be created and executed within the same process. Multithreaded systems are more frequently used as a server in a client-server environment to provide uninterrupted and responsive services.
Another technique related to multithreading which is becoming more prevalent is preemptive multitasking. Preemptive multitasking enables the sharing of the processing time amongst running programs. Each running program may be assigned a recurring slice of time from the CPU. Depending on the operating system, this time slice may be the same for all programs or it may be adjustable. For example, a modem or network program may be assigned continuous processing slices to be able to process the incoming data stream without loss of data.
With the advantages of preemptive multitasking systems comes a cost associated with predicting where a system has left off its operations when a fault occurs. To ensure continuous provision of service to a client, it is imperative that a secondary system takes over the operations of a faulty system as quickly as possible. Generally, when hardware replication is used to provide system fault tolerance, two identical servers operate simultaneously in parallel to one another within a network. To provide for a mirrored operation of a computing platform, the states between two mirrored computers need to be copied. Given the fact that the two computers execute software, if given the same inputs, the two computers will produce exactly the same output. The problem arises in the duplication of the inputs to the computer. Inputs such as network, keyboard, and mouse are easily duplicated but in a system where the operating system is preemptive the duplication of the preemption point is difficult to mirror exactly. As a result, these systems are incapable of dealing with preemptive multitasking systems that are readily available off-the-shelf and forego the benefits associated therewith.
One solution is to avoid using a preemptive operating system altogether and forego all benefits of such a system. Alternatively, one can use an operating system specifically designed for state mirroring without utilizing the available off-the-shelf systems and all their benefits (such as cost savings, customer support, and the like). Accordingly, there are significant costs associated with provision of fault-tolerant systems based on the current designs, partly, because these systems require use of proprietary software and/or hardware.