1. Field of the Invention
This invention relates generally to computer networking, and more particularly to application and protocol independent asynchronous support of fault tolerant and adaptive communication using commit techniques.
2. Description of the Related Art
With the increased use of mobile and remote computing, distributed processing has become a central element in many computer processing systems. Distributed processing has many different forms depending on the nature of the data and the objectives of the application. For example, one emerging form of distributed processing is mobile computing, such as used in telematics.
Telematics refers to systems used for communications, instrumentation, control, and information technology in the field of transportation. Over the years, manufacturers of on-road vehicles, such as automobiles, vans, trucks, buses, and so on, have utilized computer technology to enhance the operations of existing features and functions in the vehicles as well as to provide new features and functions. For example, programmed controllers, custom-designed processors, embedded systems, and/or computer modules have been developed that support or even control various kinds of mechanical equipment in vehicles. For example, programmed controllers or computer modules have been developed that control or support various engine functions, such as fuel injection, timing, and so on. Programmed controllers or computer modules have been developed to enhance or support operation of transmission systems, suspension systems, braking systems, and so on. The sophistication of these enhancements has advanced as the processing power available for these purposes has increased. It is expected that in the future more aspects of the mechanical equipment in vehicles will be controlled or supported by processors or controllers in order to enhance performance, reliability, safety, and to reduce emissions.
As can be appreciated, the distributed nature of telematic functions generally requires a digital distributed communication structure such as that provided in distributed computing environments. However, as with most communication, digital communication is subject to interruption or failure. As such, an ability to restart communication after interruption is important to most distributed applications.
FIG. 1 is a block diagram showing a typical prior art distributed environment 100. The distributed environment 100 includes a client application 102 executing on a client device, which is in communication with a server application 104 executing on a server device. Generally, the client application 102 and server application 104 communicate using a logical connection 112, which is a logical entity used by applications to exchange data between two endpoints, such as the client application 102 and the server application 104. Although the application programs 102 and 104 function as though the logical connection 112 were a physical entity, the logical connection 112 requires a communication channel to actually transmit data.
Software known as the communication stack 106a and 106b is used to map logical connections to communication channels, which includes the actual communication hardware 108a and 108b. The communication stacks 106a and 106b handle data routing, flow control, buffering, error correction, and other computing issues encountered in real-world communication. Entities, such as the client application 102 and the server application 104, use one or more logical connections 112 to communicate with other entities by sending and receiving data in a sequential fashion over a period of time. Due to the sequential nature of the communication, data communication typically is stateful, meaning that the completeness and order of the data transmitted should be preserved.
However, the flow of data over a connection can be interrupted for many reasons, such as the failure of the underlying communication hardware 108a and 108b or because the connection is rerouted to a more advantageous communication channel. During such interruptions data may be lost, corrupted, or reordered. In order to continue communication once the channel has been reestablished, a method should be in place that restarts data transfer. Unfortunately, many prior art systems restart lost connections from the beginning of the data transaction, thus re-transmitting the entire transaction.
For example, in FIG. 1, the server application 104 begins sending data elements 1 114a, 2 114b, 3 114c, and 4 114d to the client application 102. However, during the transmission the logical connection 112 between the server application 104 and the client application 102 is interrupted. As a result, the client application 102, in this example, received only two data elements, 1 114a and 2 114b, of the four data elements transmitted. Once the connection is reestablished, the prior art distributed environment 100 will restart the entire transmission beginning again with data element 1 114a, thus re-transmitting the entire transaction.
If the connection state of the transaction was maintained when interruption occurred, the transaction can be restated at the point of the interruption. For example, in FIG. 1, the server application 104 can begin re-transmitting from data element 3 114c and continue the remainder of the transaction, instead of re-transmitting from the beginning of the transaction with data element 1 114a. However, prior art solutions to obtaining and maintaining connection state information for restarting after connection interruptions have been overly burdensome and inconsistent.
For example, some prior art systems have attempted to make fault tolerant logical connections transparent to the applications by implementing all the fault tolerant functionality in the communication stack 106a and 106b. However, these systems are overly complex, burdensome, and inconsistent because the communication stack 106a and 106b is required to perform buffering, keep track of how much data was sent and received from each endpoint, and keep track of what data was lost. Since the communication stack 106a and 106b cannot determine what data is actually important to a particular application, the communication stack 106a and 106b must have the functionality to track all data, and store all network information regarding the state of the connection.
To avoid such complexity, some prior art distributed systems shift the burden of fault tolerant communication entirely to the applications. Unfortunately, different applications can implement fault tolerant schemes in different ways, causing inconsistency and requiring greater care in developing distributed software for existing distributed applications. Moreover, different applications will require similar functions because they must be able to handle the same communication scenario. This causes waste, in terms of duplication effort, and additional implementation inconsistencies.
In view of the foregoing, there is a need for a method for supporting fault tolerant and adaptive communication that promotes consistency and reduced complexity. The method should allow reestablished connections to restart from the point where interruption occurred, and should be independent of the communication protocol, the format of the data transmitted, and any application policies.