1. Field of Invention
This invention realtes to the field of computer networking, and more particularly to a method for handling requests for information over a very high speed network where, due to the nature of the request, the time required to handle the request is longer than the timeout set up to handle lost messages on the network. The invention is especially adapted for a network of the collision detection variety, for example, ETHERNET.
2. Description of the Prior Art
The fast reliable communication of information between computer systems has become an essential part of the modern office and factory. Various groups of manufacturers and users of computer equipment have joined together to define a networking interface standard and a few network protocols. The networking interface standard divides the functions to be performed into seven levels from Level 1, which is the physical interface, through Level 7. The network protocols define a hardware and software interface which enables computer and non-computer equipment made by different manufacturers to communicate quickly and with a minimum of custom programming. One such network protocol is known as ETHERNET.
ETHERNET is a Level 2, multiple access network protocol which uses a collision detection scheme to permit multiple computer systems or other equipment connected to the network to communicate asynchronously. In order to transmit information over the network, the network protocol requires that the information be divided into one or more messages. Then, basically, a computer system tests the network prior to sending a message to determine whether another computer system is sending a message over the network. If the network is not busy, the computer system sends the message to a destination computer system. If the network is busy, the computer system waits a prescribed period of time and tests the network again. One problem occurs when another computer system tests the network simultaneously with the first computer system and also discovers that the network is not busy. The second computer system will also send its message. There will be a collision between these messages on the network and neither destination computer system will receive its message.
In order to make the network more reliable, the network protocol requires both the first and second computer systems to monitor the network after sending their message and, if a collision is detected, retransmit their messages after a prescribed delay. This delay begins at approximately 1 millisecond and increases exponentially with each subsequent collision. However under certain circumstance, the collision will not be detected and, hence, the messages may still not reach their destination computers.
Although statistically this protocol provides a fast easy-to-use-and-to-configure network, the network is basically unreliable because a computer system cannot be guaranteed that a message which is sent will actually be received. Therefore, the next high interface level, usually a Level 3 or Level 4 protocol requires that when it is important that a message reach the destination computer system, the destination computer system should acknowledge the receipt of the message. Generally this requires that each sending computer system on the network receive an acknowledgement that the first message was received before it can continue to send the second message. According to this handshake, the sending computer systems will, after waiting a prescribed period of time without receiving the acknowledgement, retransmit the first message assuming that the message was lost. If the network is working properly and not too busy, the retransmitted message should make it to the destination computer system. It is important to recognize that when the Level 3 or 4 protocol sends a request, the Level 2 protocol may need to transmit the message several times before the Level 2 protocol believes the message was sent successfully.
The prescribed period of time for response over a network is often referred to as a timeout and is carefully defined at each level of network protocol. This prescribed period of time is a function of the network level, the size of the network and the speed of the network. Usually, the lower the network level, the shorter the prescribed period of time for a message to be sent or received. In some networks, the same timeout is used for both requests and responses, while others may use different times. In many networks, the timeouts are determined experimentally to reduce collisions while maximizing throughput.
One prior art high level protocol, the request/response protocol, has been developed which provides the low level request/acknowledge handshake by letting the response message transfer the results of the request as well as acknowledge the request message. Under this protocol, one computer system may obtain information from a serving computer system by requesting the information and waiting for an acknowledgement in the form of the requested information. If the sending computer system fails to receive the response from the serving computer system within a prescribed period of time it again requests the information, assuming that either the request or the response to the initial request was lost over the network. The serving computer system then repeats the request and generates and sends a new response. The prescribed period of time for this network level will be longer than the timeout of a lower level since this prescribed period of time must also take into account the time required to perform the request. Although this protocol results in a fast and efficient method for performing network requests, there are some problems.
The first problem is that some requests over the network may not be repeated without causing problems. This may be best explained by examining a typical file operation. A file operation typically consists of four requests: a file open, a file read and/or file write, and then a file close. The file read and write requests may be repeated in the event either the request or the response to the request are lost on the network. This is true because the file read and file write request have no state information saved in the serving computer system which might be corrupted by repeating the request. For example, if a file read request is made on an open file and the request is lost over the network, the requesting computer system merely retransmits the same file read request. If the response to the file read request is lost over the network, the requesting computer system again simply retransmits the same file read request. The serving computer system then re-reads the file accordingly and then sends the requested information over the network where it probabaly reaches the requesting computer system. These requests, which may be repeated without difficulty, are referred to as idempotent requests.
There are, however, a class of requests which may not be repeated. These requests typically have state information stored in the serving computer system which may be corrupted by repeating the request and are referred to as non-idempotent requests. These requests may not be repeated in the event a response is lost on the network. An example of a non-idempotent request is a file open request. The file open request determines the location and other information required by the operating system of the requesting computer system for the requested file and marks the file as be opened typically by incrementing a reference counter. Then this information is sent to the requesting computer system so the requesting computer system can perform the desired requests on the file. If a request to open a file is lost on the network, there is no problem since the serving computer system has not opened the file and the requesting computer system may simply retransmit the request for the file to be opened. However, if the file has been opened and the response is lost over the network, then the requesting computer system may not repeat the request that the file be opened because the serving computer system would again increment the reference counter causing the file to appear to have been opened twice. Similarly, the close request in non-idempotent. Therefore, the prior art request/response protocol must be modified to avoid this problem.
This problem is typically resolved by adding an additional response to the request/response protocol referred to as the acknowledge message or "ack". Under the revised protocol the requesting computer system sends the serving computer system an ack after the requesting computer system has received the response to its request. As with the idempotent request, the requesting computer system begins by sending a request to the serving computer system. The serving computer system performs the request and sends a response to the requesting computer system. However, the serving computer system saves a copy of the response until the receipt of the response by the requesting computer system is acknowledged. The requesting computer system then receives the response and sends an ack to the serving computer system. After the ack has been received, the serving computer system discards the copy of the response.
In the event the requesting computer system fails to receive a response, as with an idempotent request, it retransmits the request. If the request was lost before reaching the serving computer system, the serving computer system receives the request for the first time and performs the request and sends a response. If the response from the serving computer system was lost, the serving computer system, after identifying the request as a repeat request, retransmits the response to the original request rather than repeating service of the request. If the ack from the requesting computer system is not received after a prescribed period of time after sending the response, the serving computer system, assuming the response was lost, retransmits the response. The requesting computer system, after identifying the re-response, sends another ack to the serving computer system. In this manner, non-idempotent requests may be performed reliably over the network.
Another problem with the prior art occurs when the request takes a longer period of time to be completed than the prescribed period of time for the requesting computer system to retransmit a request. In this case, the requesting computer system will retransmit and retransmit the original request until it incorrectly assumes that the serving computer system is dead or not on the network. One prior art solution to this problem is the "breath of life" message. This message is transmitted by the serving computer system in response to a repeated request while the serving computer system is still working on the previous request. When the requesting computer system receives the message, it knows that the serving computer system is operating on the network and working on the request. The requesting computer system will again wait the prescribed period of time for a response and if one is not received, retransmit the request and wait for either a response or the breath of life message.
The breath of life mechanism has two limitations. First, a network using the breath of life message will only recognize the serving computer system as being dead if there is an outstanding request. Second, it keeps retransmitting the request while the server is still processing the request, generating unnecessary network traffic. This is a serious problem for a network of the multiple access collision detection type where increasing traffic results in increasing collisions and hence more traffic until the network becomes jammed.
What is needed is a method for transferring idempotent and nonidempotent requests over a network which handles slow requests in an efficient and reliable manner.