Today, there exists a large number of computer devices and systems exchanging data across a variety of communications paths. Computer devices usually communicate by the electronic transfer of data across at least one of a variety of data buses or links. As used herein, the phrase "computer devices" can be any of a wide variety of electronic apparatus, such as personal computers, servers, printers, terminals, processors, storage devices, and many other such entities. A computer system may be comprised of a number of such devices often physically co-located. However, in some cases, computer systems are distributed, wherein not all of the devices are co-located.
Devices which are co-located are said to be "local" to each other, and often communicate over a local data bus, e.g., a SCSI data bus. A local data bus provides a physical and logical communication path among local devices, e.g., devices within the same office building. The local data bus will occasionally use a gateway to control the flow of data on the data bus. Whether two devices are local to each other depends on the distances over which the particular data bus under consideration can adequately transmit data. When a local data bus is insufficient to support communication between devices, the devices are said to be "remote" to each other. Remote devices often communicate over a remote or "long haul" data link. Two examples of commonly used long haul links include a telephone line and a fiber optic line. The term "link" as used herein refers to the communication path between two long haul devices, exclusive of the long haul devices themselves. The long haul devices which drive data across a long haul link may transmit data over large distances, i.e., several miles and beyond. A long haul device typically acts as a gateway between a local data bus and long haul data link, controlling the flow of data from devices connected to the local data bus to the long haul link and vice versa. Commonly used long haul devices include modems and bridges.
Bridges, in conjunction with the data link, transmit and receive data in either a simplex or duplex communication mode, depending on the capabilities of the bridges and link. A simplex communication path allows data transmission in either direction, but in only one direction at a time. Alternatively, a duplex communication path allows data transmission in both directions simultaneously. In the case where the data link is fiber optic, it is typically implemented as a simplex communication path. In many situations, it is not cost effective to install a duplex fiber optic communication path because of the relatively high cost of fiber optic multiplexers which provide a necessary interface to the bridge.
FIG. 1 depicts a typical distributed computer system configuration 100 using a simplex communication path, comprised of bridges 125, 135 as long haul gateway devices and a fiber optic long haul data link 130. A local data bus 115 interconnects multiple local devices, including the bridge 125, and can be referred to as a data bus "segment" with respect to the larger computer system 100. Data link 130 interconnects the bridges 125,135 to accomplish interconnection of the distributed computer devices within system 100. Computer devices 110,140 can be generically referred to as hosts or initiators, when required to transmit data to another device. Computer devices 120, 150 are generally referred to as target devices, because they are the intended recipients of an initiator's transmission. For the purposes of this discussion, the computer devices are considered "peer" devices. Peer devices have equal status regarding data transmission within the system, such that no peer device has inherent ability to assert its communication requests over the communication requests of another peer device.
Communication between devices which are remote to each other is typically straightforward. For example, initiator 110 of FIG. 1 transmits data across the local data bus segment 115 to bridge 125. The bridge transmits the data across data link 130 to bridge 135. Finally, bridge 135 transmits the data across data bus segment 145 to target device 150. In order to accomplish this data transmission, the initiator 110 must first "take control" of the local bridge and then take control of the remote bridge. To take control of a bridge, a device gets the bridge to dedicate itself to the transmission requested by that device. Once control of both bridges is secured, the initiator 110 and target 150 have secured the communication path and may exchange data.
One characteristic of a simplex communication path is that multiple devices may be competing for the path at one time, even though the simplex communication path is only capable of accommodating transmission in one direction at a time. Therefore, contention for the bridges and data link may result. In most cases, this is not a problem as long as, for example, initiator 110 requests bridges 125 and 135 before initiator 140 requests bridge 135. In such a case, initiator 110 gets control of bridges 125 and 135 before initiator 140 gets control of bridge 135. However, if the first bridge is controlled by one device and the second bridge has been taken over by a different device, a "deadlock" occurs. In a deadlock situation, neither device can successfully transmit over the simplex communication path because both bridges are trying to transmit to each other at the same time.
A specific example of how a deadlock can occur in a computer system can be described with reference to FIG. 1. For the purposes of this example, it may be assumed that interlocking mechanisms 126 and 127 (which are discussed later) are not part of system 100. In this example, initiator 110 attempts to write data to target 150 at about the same time initiator 140 attempts to write data to target 120. Initiator 110 transmits a write command to target 150 and, in doing so, initiator 110 "arbitrates" for bus 115 and wins the arbitration, since at the time there is no other contention for bridge 125 or bus 115. The process wherein a device attempts to get control of the communication path, by taking control of the bridge pair and link, is referred to as "arbitration". The long haul data link port of bridge 125 becomes idle, i.e., the bridge "disconnects", as bridge 125 prepares to communicate the write command to target 150, via bridge 135. Herein the term "disconnect" refers to when a bridge or other device ceases the transmission of messages from its ports (at least temporarily), although it may continue to receive messages. When ready, bridge 125 becomes active again and propagates the initiator's 110 write command to bridge 135, which in turn transmits it to target 150. Upon receipt of the write command sent by initiator 110, target 150 disconnects, as it prepares to respond to and get data from initiator 110. During this time, initiator 140 issues a write command to target 120 and then disconnects. Bridge 135 receives the write command propagates it through bridge 125 to target 120. Target 120, upon receipt of the write command, disconnects and prepares to respond to and get data from initiator 140.
At this point, there are two write commands outstanding in the system, one in each direction, and each initiator 110,140 is disconnected from its respective bus 115, 145. Both target 120 and target 150 reconnect and take control of bridge 125 and bridge 135, respectively, in an attempt to get data from initiator 140 and initiator 110, respectively. Each target then attempts to take control of the second bridge needed to establish the full communication path to their respective initiators. However, neither bridge is available to the target remote to it, since the target local to it is already controlling it. Typically, this deadlock situation remains indefinitely until the system is reinitialized.
Many systems are implemented to avoid a deadlock situation. Deadlock avoidance is accomplished typically by using either a fully or partially interlocking system. Interlocking systems rely on synchronization among the various devices in the system, such that a device attempting remote communications is required to determine that both bridges are available before it takes control of either bridge. This process involves a typical "handshaking" scheme, whereby devices seeking to communicate exchange acknowledgment messages signaling their availability. Customarily, interlocking mechanisms 126, 127 are embedded in the bridges, as shown in FIG. 1. If both bridges are not available, an initiating device, e.g., initiator 110, will not make its transmission. Instead, the initiator 110 will stay idle until it can acquire and control both bridges. Interlocking mechanisms are well known in the art of data transmission systems and devices and will not be discussed in detail herein.
While the problem of a deadlock is described herein in terms of a very simple two bridge link, multiple bridge systems are more the norm. In multiple bridge systems, the problems are fundamentally the same as those for two bridge systems, although the likelihood of contention is greater. Implementation of typical interlocking mechanisms requires that all devices seeking to transmit over the link continually monitor the link to ensure that the bridges are available for the desired transmission. The continual monitoring by all devices results in a great expense to the system in terms of taking time away from other processing activities. The expense of deadlock avoidance increases as the length of transmission increases, becoming particularly problematic at distances of five hundred feet and beyond. This proves to be inefficient because, typically, a deadlock will only occur about once in every 10,000 data transmissions. Therefore, the large majority of the time spent monitoring is of no benefit. Given the absence of relative low cost deadlock recovery mechanisms in such computer systems, the expensive use of resources to prevent deadlocking by implementing interlocking schemes has been necessary.
There is a need to allow computer devices to communicate across data links without expending significant resources for providing link monitoring. There is also a need to implement a solution within reasonable cost constraints given the basic communication bridge devices available today, rather than implementing a costly duplex fiber optic solution.