In a computer network, numerous nodes communicate with each other in accordance with a communication protocol that provides logical connections between any pair of participating nodes. The nodes may be connected through a fiber, through a wireless network or through some other medium.
A network may have a fixed capacity regardless of the size of the network and the power of the composite components. Only when consumer nodes of the network are underpowered, is it possible to ignore the possibility of network resource contention. Network components that provide network resources usually grow at the same pace and with the same technological advancements as network nodes. Accordingly, networks are usually designed to have flexible functional extensibility for future growth. Usage assumptions that hold at the first deployment of a design may not hold in the future even though a newer design is functionally compatible with older designs.
In the case where network components are not overpowered to guarantee sufficient network resources for extreme or skewed usage, proper measures need to be taken at network nodes to avoid putting too much traffic onto the network. Networking components and protocols are typically designed to handle a predetermined load, but when the load is beyond a certain capacity, the efficiency of the network decreases. Congestion decreases efficiency which results in more loads on the network, causing congestion to increase in a self-aggravating manner.
For example, a network may be implemented with the policy that no packet would be dropped by any network component. Typically, in such type of network architecture, a link-based flow control mechanism called backpressure is implemented to handle resource contention. In the case of resource contention, backpressure control information generated by a resource tight composite component (which may be a node or a network router) would be sent along the communication path towards the direction of the source of the traffic flow to stop an immediate upstream router from sending more traffic to such component. When the resource contention situation has eased at the receiving component, the source of the traffic flow may commence transmitting again. The communication is typically reinitiated by having such component to inform the immediate upstream router to resume transfer to it. During the period of time when the source of traffic flow is not sending, there is zero utilization of the link. Furthermore, this condition can be back-propagated up-path if the backpressure problem still persists. If this type of network is implemented with logical connections sharing links (virtual/physical), such link congestion could lead to pausing of unrelated logical connections which leads to performance degradation.
Another example is where a network may be implemented with the policy that any network component may drop packets if it doesn't have sufficient resources to handle the traffic. Dropping packets would require a recovery on the sender side and would require retransmissions that would further increase the load on the congested network.
In order to most efficiently transfer packets with the least complexity, some networks have been designed with overpowered network components and underpowered nodes. This approach can avoid the need for any congestion control or traffic control mechanism. However, when the network architecture is extended in size or more powerful network components are introduced, this assumption does not hold. In such a situation, the introduction of enhanced network components may create new hotspots, aggravate existing hotspots or change the hotspot of the network. It also increases performance variation among network components.
Another network architecture provides only best effort services wherein the traffic flowing within the network is not monitored or managed. Instead, an end node allows clients' data traffic to go onto the network as long as it has enough resources to process such transfer on the sending side of a logical connection. This network architecture assumes that the servicing network is able to handle the traffic unless there is a physical connectivity problem existing somewhere on the communication path corresponding to such transfer. There are no measures taken to detect or prevent network congestion, or alleviate congestion problems.
It is also possible to design a static single node centric non-distributed network in order to alleviate network congestion problems. Specifically, the designer of this type of network devises a policy for each participating node limiting the amount of load a node puts onto the network. The policy is based on the assumption that other nodes are utilizing the network in a similar manner. Typically, the policy must bias towards the most pessimistic assumption in order to avoid problematic scenarios. The amount of biasing is usually based on an educated guess as to what the most severe type of network congestion will be. However, the fact that these extreme cases usually dictate the boundary conditions, but are rare in occurrence, can cause the architecture to be over-constrained and under-performing in most cases. Moreover, assumptions made on such a simplistic model are usually wrong in one way or another because the load experienced by the network usually depends on more than just the behavior of a single node. In many cases, the viability of such policy relies on the assumption that network traffic is evenly distributed. However, such assumption usually precludes the most problematic scenarios a congestion control algorithm should solve.
A distributed traffic control solution for a network can also be used to control network traffic. In such a network, participating nodes exchange traffic information using either in-band or out-of-band communication mechanisms. By exchanging such traffic information, each participating node would have an idea of the current network usage, and would be able to subsequently determine how such overall condition affects the usage policy.
A distributed peer-to-peer network model allows peers to simply exchange network usage information and let the nodes decide individually what to do with the network usage information. Typically, the participating node would use well defined policies when deciding how much load to put onto a network based on the collected network usage information. For example, each node collects network usage information from other nodes regarding the current outstanding traffic. A node can continue to put a load on the network if the total amount of outstanding traffic from all participating nodes is less than a certain predetermined threshold.
In a distributed master-slave network model, the master node collects network usage information from the slave nodes and uses such information to decide the amount of network resources a particular slave node may utilize.
The policies that the nodes utilize are typically based on a certain computational model as a function of the network configurations such as the topology of the network, the type of participating components, etc. For example, the nodal logic has to be aware of how different logical connections utilize the networking components. The logic may have to be aware of a bottleneck connection between a group of tightly coupled processors and an external fabric, and how restraining such bottleneck connection affects all outgoing traffic. An accurate model mirrors how the hardware is connected together. However, for a sophisticated network, the computation and the resulting combinatorics may be too complicated to accurately model.
A good computational model must provide a close approximation of the real platform. The model cannot be over-simplistic or its behavior would not mirror the real platform. As such, a simplistic model usually is biased toward a more restrictive model to ensure the model can run safely without over-accurately mirroring the real platform behavior. However, in a complex network environment with hundreds of nodes connected together in a non-trivial way, such simplistic yet accurate models are very hard to obtain.
What is needed is a system and method that does not require a model to be built before deploying algorithms. A model devised for handling network resource usage might be too simple and problematic on boundary and extreme cases. Much time has to be spent on designing an accurate model with little operational overhead. Also needed is a cooperative distributed algorithm that is extensible.