Failsafe operation of information technology systems is of fundamental importance for most modern society activities. Due to this there are many precautionary systems that are made to handle situations of failure.
Such systems could comprise safeguarding of information via memory backup systems as well as safeguarding of complete system functionality. The latter could comprise completely mirrored or redundant systems, where all actions are executed in primary and secondary system components (computer, processor, server, etc).
One such failsafe system is disclosed in U.S. Pat. No. 6,526,487, where memory contents are synchronised. A primary computer system includes a memory and a delay buffer receiving write requests for data to be stored in the memory. The content of the delay buffer is transferred to a backup computer system. When the transfer is complete, the backup computer system acknowledges that the data has been received, whereupon the primary computer system proceeds by executing the write request. Hereby, the two systems are synchronised at all times and any failures occurring before the acknowledgement is received will result in the write request not being executed.
In the financial field, e.g. electronic exchange systems for stocks, bonds, derivates, etc, failsafe high-speed in-memory servers are used. These systems are also referred to as replica server systems. Similar to the above disclosed system, a replica server system comprises a primary replica and a secondary replica, both being identical and in the same state. Should the primary replica fail, the secondary replica will take over immediately. Of course, the system may contain several secondary replicas to improve safety. Upon failure of the primary replica, one of the secondary replicas will then become a new primary replica, while the others remain as secondary replicas.
One pronounced problem with replica server systems and other similar systems is lag times. In a replica server the primary replica receives an input data, stores this to a buffer (normally in an I/O interface), writes it on a persistent memory, e.g. a disc (by flushing the buffer), transfers the input data to the secondary replica (or replicas), wait for the secondary replica to store the input data in a buffer, write the input data on its own persistent memory (disc) and acknowledge receiving the input date, whereupon the primary replica can process the input data and output the processed data (via the I/O interface, thus also storing the output data in the buffer).
In particular writing to a disc (or whichever persistent memory that is used) normally is very time consuming for a system that is supposed to be able to handle thousands (or more) parallel transactions or events. Basically, writing takes about 5.8 ms (flush time) for a normal disc drive (15 000 rpm and a seek time of 3.8 ms). There are also certain systems available (such as RAID and SAN systems, as well as RAM discs) that have enhanced write performance—these system are however very expensive.
Another problem with replica server systems is capacity, i.e. number of events or transactions that are possible to perform each time unit.
In replica server systems handling financial transactions (e.g. electronic exchange) flush is made for every new entered input data in both the primary replica and the secondary replica(s). In view of the above noted flush time of 5.8 ms for each event, the limit for the system will be about 170 transactions per second (TPS). In order to enhance this rate, electronic exchanges of today may need to use the expensive systems for enhancing write performance. Such investments however require high liquidity on the exchange (high number of transactions) in order to pay off.
The secondary replica also processes the input data to create duplicate or replica output results. When the primary replica and secondary replica are located at large distances, even different continents, network latency is also noticeable. Here, however, state of the art systems can provide fairly low latency times even for transcontinental communication. For instance, it is possible to obtain 100 MB broadband between New York and Chicago. This carries a network latency of 1 ms or less, which is a great improvement as compared with telephone or satellite communications that can carry round-trip times of 60-500 ms. Together with flush write time, the total lag time can thus be significant.
The reason for this procedure in financial systems is of course to be as failsafe as possible. But even so, there exist other problems that could still cause vital effects for uninterrupted operability of the system. Vulnerability to systematic errors is always present and could for example be caused by logical errors such as division with zero. Such errors could actually cause all replicas to fail, since it is not until after acknowledgement that such error becomes evident. Since all information is written on disc (persistent memory) before processing it, however, the known systems can be restored and resume their operation (after skipping the event causing the crash). Such restoration of course takes time and meanwhile all activities (in case of an electronic exchange all handling in the financial commodity) are closed down.
There is thus a need for a faster and even more reliable replica server system. Especially a system that can operate in failsafe mode while operating large numbers of parallel transactions.