Transmission Control Protocol (TCP) is a reliable transport protocol described in various literature such as Internet Engineering Task Force (IETF) Request for Comments (RFC) 793, “DARPA Internet Program Protocol Specification,” September 1981, and IETF RFC 1122, “Requirements for Internet Host—Communication Layers,” October 1989. An application program on a computer uses TCP to send and receive data with another application program located on a remote computer. TCP uses Internet Protocol (IP) to send data in packets to its destination. IP will deliver the packets to the correct destination along a path of its own choosing. IP may fail to deliver a small number of packets or it may deliver a small number of them in a different order than in which they were sent. TCP assigns a sequence number to every byte of data that it sends. The receiving TCP can use the sequence numbers to reorder the received data to deliver them to the application program on the receiving computer in the same order that they were sent. The receiving TCP also uses the sequence numbers to detect missing data and cause the sender TCP to retransmit them. As a TCP receiver receives data, it will occasionally send an acknowledgement back to the TCP sender. The acknowledgement contains a sequence number. This indicates that the receiver has successfully received all sequenced data up to the acknowledged sequence number. The TCP receiver will then deliver the received data to its application process. In response to receiving the acknowledgement, the TCP sender will remove the acknowledged data from its retransmission list. The retransmission list is a temporary storage for transmitted data on the TCP sender. As the application process asks TCP to send data, the sending TCP simultaneously puts that data onto its retransmission list. At certain times as specified in various RFCs, if the TCP sender does not receive acknowledgement for sent data, it will retransmit the data on the retransmission list. In this way, even if IP loses some data, TCP can recover it.
One way to achieve high reliability of computer application programs is to use the active-standby method. A single standby/active computer system can be built with two fully functional computers. One functional computer is termed the active computer and the other is termed the standby computer. The active computer works as normal and the standby computer waits to take over operations when and if the active computer fails. The activity when the standby takes over is termed a switchover. On switchover, the standby computer becomes the active computer. The old active computer is no longer functioning in the standby/active computer system. The old active computer may rejoin the standby/active computer system at a later time, such as after repair or reset. The old active computer may rejoin the standby/active computer system, either taking back the active computer role or taking the standby computer role.
If the standby computer can switchover without causing disruption, the switchover is termed a hitless switchover. In this case, other computer systems that interact with the standby/active computer system view this redundant computer system as a single computer and do not detect the failure or the switchover event. As such, the failure can be repaired without impacting the interaction of the standby/active computer with other computers. For the switchover to be hitless, the standby computer must communicate with the active computer to track the progress of the active computer and save all essential data as it is created on the active computer.
At the point of failure, the standby computer may not have all data from the active computer, because communication between active computer and standby computer may have failed before all data could be sent from the active computer to the standby computer. In this case, the standby computer must recover the lost data for the switchover to be hitless.
There are methods for the TCP process to achieve a hitless switchover. However, those methods ignore the application process that uses TCP. A method will send incoming TCP data from the active TCP process to the standby TCP process before sending the acknowledgement to the remote TCP peer. The active TCP will pass the received data to the active application. Then the application will process it and possibly send some updated state to its standby application to synchronize the updated state. The standby application will also receive the same data from the standby TCP and update its own state. The application process is typically processing inputs that come from sources other than the TCP connection. With the existing solution, it is complex and error prone to keep the processing of other inputs and the TCP inputs synchronized between the active computer and the standby computer.
Concerning TCP output data, in an existing solution, the active TCP will send outgoing data to the standby TCP before sending it out to the remote TCP peer. This allows the standby TCP to retransmit the outgoing data it in the event of a switchover and then a failed transmission. However, it is complex and error prone for the standby application to know exactly what the active application has and has not sent at the time of a switchover. Some applications are able to retransmit outgoing data at the application level without causing disruption of the application. Some, such as file transfer protocol (FTP) or an echo server cannot retransmit application data that has been successfully transmitted without causing disruptions of the application. Even when an application can afford to retransmit some data on switchover, it is difficult for it to know how much data to retransmit safely.
Therefore, it is desirable to provide a mechanism for TCP and the application program to interact in a manner to simplify hitless switchover by allowing the standby application to more accurately track the status of the active application and incoming/outgoing TCP transmissions.