The present invention relates to switching apparatus in general and in particular to applications of switching apparatus as shared counters, shared pools, shared stacks and the like in multi-processor environments.
As multi-processing breaks away from its traditional number crunching role, there is a growing need for highly distributed and parallel coordination structures which provide fast responses under both sparse and intense activity levels. Typical applications,include radar tracking systems, traffic flow controllers, communication exchange facilities, barrier synchronization, index distribution, shared program counters, concurrent data structures, dynamic load balancing and the like.
Up to the present time, shared counters, pools and stacks have been used to solve a variety of coordination and synchronization problems in a multi-processor environment. In its purest form, a counter is an object which holds an integer value and provides a fetch.sub.-- and.sub.-- increment operation, incrementing the counter and returning its previous value. While pools (also called piles, global pools, and producer/consumer buffers) are concurrent data-types which support the operations: enqueue (e) which adds the element e to the pool and dequeue (*) which deletes and returns some element (e) from the pool. And while stacks are pools with LIFO ordering.
The prior art teaches several approaches for implementing counters, the most con,non of which are surveyed in a paper entitled "Scalable Concurrent Counting" in the Proceedings of the 3rd Annual ACM Symposium on Parallel Algorithms and Architectures, July 1992, San Diego, Calif. which is incorporated herein by reference as if set forth fully herein.
One such approach is called "counting networks" which employs one input two output computing elements called "balancers". Intuitively, a balancer can be regarded as a toggle processor having a first operative state in which a token is routed to a first output wire and a second operative state in which a token is routed to a second output wire. Each passage of a token through a balancer switches the operative state of the balancer such that for a stream of tokens, a balancer repeatedly and alternately passes one token to its first output wire and one token to its second output wire, thereby effectively balancing the number of tokens that are output on its two output wires.
Balancers are typically interconnected to form a balancing binary tree having a width w equal to the total number of output wires of the tree. Balancing binary trees can be readily adapted to count the total number of tokens traversing therethrough by adding a "local counter" to each output wire i so that tokens outputting on wire i are consecutively assigned numbers i, i+4, i+(4*2), and so on. However, it is well known that balancing binary trees of the above construction suffer from the disadvantage that the root of the tree is prone to become a "hot-spot", causing a sequential bottleneck of tokens.
The literature offers a variety of pool implementations. On the one hand, there are queue-lock based solutions as described in a paper entitled "The Performance of Spin Lock Alternatives for Shared Memory Multi-processors" by Anderson, IEEE Transactions on Parallel and Distributed Systems, 1(1):6-16, January 1990 and a paper entitled "Synchronization without Contention" by J. M. Mellor-Crummey and M. L. Scott, Proceedings of the 4th International Conference on Architecture Support for Programming Languages and Operating Systems, April 1991. These solutions offer good performance under sparse access patterns but scale poorly since they offer little or no potential for parallelism in high load situations. On the other hand, there are simple and effective randomized work-pile techniques, for example, as described in a paper entitled "A Simple Load Balancing Scheme for Task Allocation in Parallel Machines" by Rudolph et al., Proceedings of the 3rd ACM Symposium on Parallel Algorithms and Architectures, pages 237-245, June 1993 that offer good expected response time under high loads but very poor performance as access patterns become sparse. Furthermore, the no randomized technique exists for implementing shared stacks.
There is thus a widely recognized need for, and it would be highly advantageous to have, apparatus overcoming the above-mentioned disadvantages of shared counters, shared pools, shared stacks and the like in multi-processor environments.