The invention relates to an information processing system comprising a plurality of information processing nodes, wherein the nodes are functionally connected to form a network. The invention also relates to a method of enabling operation of an information processing system having a network with a plurality of information processing nodes.
It is known to use a broadcast protocol to verify the presence of active nodes in a network. Broadcast protocols work by sending a message to all nodes and requesting response. If a particular node needs to know what nodes are active in the network, the node broadcasts a message to all other nodes and waits for the nodes to respond to the broadcast. However, broadcast protocols are not reliable for several reasons.
First, there is no guarantee that all relevant nodes have received the message. Broadcast protocols typically rely on a best-effort-delivery assumption. Such best-effort-delivery procedures generally are not set up to ensure or verify that all nodes have received the message, but merely that an attempt has been made to notify all relevant nodes.
Second, broadcast protocols typically broadcast the message to all nodes substantially simultaneously. As a result, the nodes receive the message substantially simultaneously, and the nodes generally respond substantially simultaneously to the broadcast message. Such near-simultaneous responses can overload the network causing an avalanche-type of failure. There are ways to prevent this type of overload, e.g., by having each node respond after a time chosen randomly for each node. However, such delay schemes add an additional drawback: the protocol does not know how long to wait until all actives nodes have had a chance to respond.
Third, broadcast protocol schemes do not scale well. Each node, which makes an. inventory of the relevant nodes, has to keep a list of the relevant nodes. This implies that, in theory, each node is to be provided with a memory of undetermined size to accommodate the list.
There is a need for a method to identify active, inactive and/or new nodes on the network without using a broadcast protocol with its numerous deficiencies. It is therefore an object of the invention to provide a network and a method for enabling operation or configuration of a network without having to rely too heavily on the conventional broadcast protocols mentioned above.
To this end, the invention provides a linked-node network including a network protocol that is implemented on each node of the network. The protocol is designed to form the network into a logically linked configuration of nodes such as a logical ring, chain or equivalent. The protocol includes at least one node address, a polling timer, a node counter, polling, healing and dummy tokens/messages, routines for sending and receiving tokens, for monitoring network integrity (testing for time-outs without token return), for adding or removing nodes, for healing or repairing breaks in the network when active nodes go inactive, for defragmenting a network, for fragmenting a network from a super network and for facilitating node resource sharing. The protocol can operate under natural node-timer staggering but controlled, even node-timer staggering is preferred.
Token-ring networks are known in the art. In networking, a token is a special series of bits that travels around a token-ring network. As the token circulates, an individual computer attached to the pre-configured network can capture it. The token enables the computer that owns the token to send a message across the network. There is only one token for each network. Accordingly, two or more computers are prevented from transmitting messages at the same time. The token thus serves as a protocol for defining the master of the bus interconnecting the computers. In the invention, the token is used for configuring, monitoring and/or re-configuring the network
The invention further provides a super linked-node network that includes a plurality of lower-level linked-node networks. A lower-level network is also referred to as sub-network below. Each sub-network is linked in a network-by-network fashion to form either a logical ring or a logical chain configuration for implementing the super network. Each sub-network includes a network-polling protocol as mentioned above. The protocol is implemented on both the super network and individual sub-networks. The sub-networks making up the super network can be linked together through one or more linker nodes in each sub-network. The invention can also support organizing nodes into sub-links, each sub-link representing working groups of nodes. The sub-links are linked into a super network through one or more linker nodes.
The invention also provides a network protocol to form a linked-node network. The protocol includes a successor node address, a node counter and a polling message or token. The protocol further includes token sending/receiving routines, a polling timer (PT) and routines for monitoring the PT for expiration. When the PT expires after the polling token has returned, the polling token is re-propagated. On the other hand, if the polling token has not returned when the timer expires, the absence is interpreted as a break in the network. A TIME-OUT condition is posted and the protocol initiates network healing that will relink the network and suspend PT monitoring to avoid TIME-OUT conditions in other nodes. The network healing routines are designed to relink the network by replacing successor addresses which reference inactive nodes, with successor addresses that reference active nodes. As a result, the network""s active nodes are reunited or relinked.
The invention provides a method implemented on each node for linking nodes together to form a linked-node network. The method includes providing each node with at least an address for a successor node, a node count, a polling message or token, and the necessary software routine. Next, the method causes the polling token to be propagated node-by-node over the network from predecessor nodes to successor nodes and resets the PT or saves the PT current value at each node concurrent with token propagation. The method then monitors for PT expiration or a TIME-OUT condition to occur. If the polling token returns (makes a full circuit of the network or successfully completes a cycle) before TIME-OUT (PT expiration), the PT is reset and the polling token is propagated forward. Normally, the duration of the PT is the network transit time. Waiting for TIME-OUT ensures that the nodes in the network are evenly time-staggered. This is so because the time it takes for a token to make a complete round-trip is independent of the node from which it started and to which it has to return as well. The protocol could simply reset the PT and immediately re-propagate the polling token, but this would result in a natural staggering of polling timers. Natural staggering results generally in a less stable network than even staggering.
If TIME-OUT occurs, a break in the network has occurred. That is, one or more active nodes have gone inactive during a polling cycle. Upon a TIME-OUT the method activates a healing protocol to relink or reunite the network. Network polling in this manner is readily generalized to linking networks together to form super networks where numerous polling token circulate over different parts of the super network, preferably in a hierarchical format.
The invention also provides a method for adding new nodes to a network. The method includes installing the network protocol of the present invention on a new node, connecting the new node to the network and broadcasting a new member message onto the network. The new member broadcast is received by a current token holder (only one per network). The current token holder updates its successor addresses to the new member""s address and the new member sets its successor address with the current token-holder""s current successor address. The method can also include propagating a dummy token. This causes other network nodes to suspend PT monitoring and await the propagation of the polling token after addition of the new node has been completed.
Optionally, the method includes steps to resolve conflicts and assign parenting rights to only one node in the event the new member message is received by more than one current token holder. This can occur in super networks. In the case of super networks, the new member receives several responses to its broadcast message and it has the option to join the network of any of the responding nodes. Although the new member can choose to join any network randomly, the protocol preferably provides selection based on information contained in the responses. The responses generally contain the node count and information about the respective network associated with respective node that responded. For example, the information clarifies that a responding node is the current token holder in a super network, a current token holder in a sub-network, or in a working group or sub-link. The new member generally chooses the sub-network with the least number of nodes (smallest node count) or a working group sub-link if it is a specialized node. Alternatively or conjunctively, the method can use backoff timers or the like to prevent multiple current token holders from adopting the new node.
The invention also provides a method for removing nodes from the network if a node knows that it will go inactive. The method comprises following steps. The departing node sets a flag in the polling token and places its address and the address of its successor in the token. Each node that receives the polling token is instructed to check its successor address against the departing node""s address and to release any resource sharing that may exist with the departing node. When a match is found, the predecessor of the departing node updates its successor address with the address of the successor of the departing node. The predecessor of the departing node turns the remove node flag off and continues polling. Again, the protocol would also cause network node re-staggering to occur because the residence time at each node would no longer be about 1/nth the transit time, n being the number of active nodes on the network (for detailed explanation on time staggering see below).
The invention further provides a method for curing, healing or repairing breaks in the network when a node TIME-OUT condition occurs during a polling cycle, circuit or operation. The method includes propagating a break message over the network from the node at which the TIME-OUT occurred. Each node receiving the message deactivates or suspends PT monitoring, sets a break timer, forwards the message and sends a response to its predecessor whose address in contained in the break message. When the predecessor receives the response, it deactivates its break timer. The break message propagates until it is received by a node immediately before the break. When this node sends the break message, a response does not occur and the break timer reaches a TIME-OUT. The TIME-OUT causes the node located just upstream of the break to update its successor address with the address of the TIME-OUT node. This address is contained in the break message. This node will, then, set the node counter in the polling token to zero and restart polling.
The invention further provides a method for defragmenting a network when two or more non-neighbor nodes have dropped off the network during a polling cycle. This dropping off causes the original network to fragment into two or more sub-links, each being formed via the healing method set forth above. Assume that defragmentation has occurred, and that a first current token holder broadcasts an xe2x80x9cI have a tokenxe2x80x9d message over the network. If another current token holder receives the message, it knows that the network is fragmented. In response, the second current token holder sends a message back to the first current token holder with the address of the successor of the second current token holder. The current token holders exchange successor addresses and the second current token holder propagates a dummy token forward over the successor node of the first current token holder to tell the nodes to suspend PT monitoring until they receive a new polling token.
The invention further provides a method for adjusting the timer values at each node to create an evenly staggered network. The method starts with a transit time set by the network system designer and adjusts the timers so that the polling token resides at each node for a time of about one no the network transit time, or more accurately, one no the network transit time minus the inter-node transit time, wherein n is the number of active nodes. Because the protocol is designed to operate in the background, this protocol feature ensures that the timers at each node have sufficiently long ranges so that token processing is able to occur and that other network processing such as add/remove node processes can be initiated. The method also includes start-up routines that start the timers off at some estimated timer setting The setting is then dynamically adjusted using an accumulated time and incremental time so that as token propagation continues, the timers will be adjusted to an equilibrium value which give rise to a token resident time of about one no the network transit time minus the inter-node transit time. Even timer staggering over the nodes makes for a more stable network environment.
The invention provides a network with a digital processing system, a memory system and an I/O system and encoded thereon the protocol of the invention.