Much work has been done on developing switching systems for asynchronous transfer mode (ATM) or fast packet networks with particular application in the distribution of video and other high speed information streams. In particular, a prior art publication describes a head end circuit for a point-to-point switch which provides broadcast capability. By broadcasting is meant the capability of producing multiple copies of a single packet of data, assigning a virtual address for each of those copies, and outputting those multiple copies through a point-to-point switch for distribution to a multiple number of desired locations. In the proposed scheme, packets are first received at packet processors where they are assigned a fanout (number of copies) and a broadcast address. The packets then pass through a concentrator network which places the packets on consecutive outputs so as to ensure non-blocking operation in the subsequent networks of the switch. Next, the packets pass through a running adder network which computes the sum of the fanouts for all packets entering the network and places this running total in a field of each of the packets. Following the running adder, the packets enter a set of dummy address encoders (DACs) which perform two functions. First, the DACs determine which packets can be processed without exceeding the capability of the network by assigning outputs to each of the packets in turn. When an output is computed which exceeds the capacity of the network, the packet is discarded and the DACs return an acknowledgement to the sending input for each packet that is not discarded indicating their transmittal. Discarded packets are retransmitted at a later time. For the packets that are not discarded, the outputs are inserted into fields in the packet and the packets are then sent to a copy network. It is of particular note that in the prior art proposal, the copy network includes a number of inputs (n) and outputs (n) which are equivalent to the number (n) of switch inputs and outputs. The copy network creates and sends copies of each packet received, in the number called for, to their assigned outputs. The copy network also labels each copy with a copy number so that as each copy reaches the next stage of broadcast translator circuits (BTCs) its broadcast channel number (BCN) and copy number are translated into an output address that the point-to-point switch uses to guide the packet to its ultimate destination. Thus, at least theoretically, this prior art proposal suggests a scheme for adding broadcasting to a point-to-point switching fabric for use in an ATM network.
One of the problems encountered in implementing this prior art proposal for adding broadcasting is the inordinate size of the memory required for each BTC, and the head end as a whole. In the prior art proposal, there is the potential for a packet having any copy number to appear at any of the outputs of the copy network and, thusly, be routed to any of the BTCs. Thus, each BTC must have translation information for all copies in all broadcast configurations. For example, with a 256 port system with bit serial data paths and 128 signal pins per chip, 62 chips is enough for the concentrator, adder, DACs, and copy network. However, assuming 65K broadcast connections and 16 bits of translation information, then each BTC requires 250 megabits of memory. As there are 256 BTCs, that translates to 64 gigabits of total memory required for implementing the prior art proposal. The required memory thus makes the prior art proposal incapable of being physically implemented in view of existing chip and VLSI technology.
Still another problem encountered in implementing the prior art proposal is the potential of worst case data throughput. Because of the way that copying is managed, a single packet with a large fanout (for example n) can prevent most other packets that enter during a given cycle from passing through the network. For example, suppose the first input of the network receives a packet with a fanout of one and the second input receives a packet with a fanout of n, where n is the number of inputs to the broadcast switch. In this example, only the first packet with a fanout of one passes through the DACs and all other packets are blocked. This is because the DACs sense that the second packet, the packet with a fanout of n, exceeds the capacity of the copy network in that the first packet with a fanout of one leaves only n-1 copy network inputs available. Thus, the DACs discard the second packet and all subsequent packets in this cycle. In this worst case scenario, the DACs could pass just a single packet with a fanout of one in that particular cycle. While this is admittedly a worst case situation, it is certainly not so far-fetched that it can be easily ignored and, in some circumstances, would significantly reduce the throughput capability of the switch.
In addressing these various limitations in implementing a practical application of the prior art proposal, the inventor herein has succeeded in designing and developing various improvements to this prior art proposal which renders it capable of practical realization while minimizing or eliminating the various shortcomings mentioned above. A first modification includes the concept of expanding the capacity of the copy network. This can be achieved in either of two schemes. The first of these would be to provide two (or more) parallel switch planes (head ends) or just parallel copy networks and the second would be to merely provide a copy network having an increased number of outputs while maintaining the same number of inputs as there are ports for the switch itself. With the parallel switch plane or copy network version, packets having a fanout greater than one are copied to both networks, with each copy having half the fanout of the original. Network operation would otherwise proceed exactly as is described above. However, with each packet having a maximum fanout of at most n/2, a packet having a large fanout can cause itself and other packets to be discarded only if the packets on the prior inputs have a total fanout of more than n/2. (This presumes that each of the two parallel copy networks have n inputs/outputs.) Consequently, the pair of copy networks is guaranteed to output a minimum of n packets. Hence, even in the worst case data traffic pattern which produces the blocking as described above, a minimum of n packets is passed by the DACs through the copy network to the BTCs for translation. This improvement eliminates the worst case throughput analysis of the prior art proposal which limits throughput to one single copy packet per cycle.
The next set of improvements are directed to minimizing the overly large memory requirements for the BTCs. A first one of these improvements is to recognize that with typical data requirements for ATMs with a cell size of 424 bits, serial data paths, and a clock rate of nominally 100 MHz, each BTC can be shared by a large number of outputs sequentially with no observable delay in processing. For example, with 32 copy network outputs accessing a BTC sequentially, there is over 130 nanoseconds available for each memory access which is sufficiently long even for high density memory chips. To implement this improvement, multiple copy network outputs may simply be connected to multiple inputs of a single BTC, and some sequencing means be provided for converting the data packets from parallel to serial and vice versa.
Another improvement which results in reduced memory for the BTCs is to recognize that the throughput capability of the network limits the possible combinations of packets having a fanout greater than a critical fanout of f.sub.1. Connections with fanout no larger than f.sub.1 (the small fanout connections) require less translation data than those with fanout larger than f.sub.1 (the large fanout connections). In addition, a smaller number of large fanout connections can be provided, since the number of switch outputs places a natural limit on the possible number of large fanout connections. Therefore, each BTC may have a memory partitioned into two parts with one each for small fanout connections and large fanout connections. The memory space thus required for each BTC is significantly reduced over that which would be required to store all possibilities for all fanout connections. With no other considerations, a critical fanout of f1=.sqroot.n provides the greatest memory saving with this improvement.
Still another improvement which results in decreased memory requirements for the BTCs involves aligning the fanout addresses with the DACs. This technique results in each BTC only being required to translate specific (and fewer than all) copy numbers by aligning each packet's copies along certain copy network outputs. The DACs are used to generate these aligned addresses so that the packets which are accepted for transmission are tagged with the correct copy network outputs on which the specific copies are to appear. By aligning the fanout of each packet in this manner, each BTC can be assured of receiving packets for translation with only a certain set of copy numbers. This improvement works particularly well with the previously described concept of BTC sharing of copy network outputs as implementation of that improvement provides for multiple copy network outputs to be connected to each BTC. Therefore, by spacing the copy network outputs which are connected to each BTC, fanout aligned addressing will route only certain copy numbers to each BTC for each translation. While this scheme does result in what might first be viewed as a wasting of copy network capability, the savings experienced by memory reduction in the BTCs more than makes up for the increased copy network requirements to result in an overall significant savings in chip count.
While the principal advantages and features of the subject invention have been explained above, a more thorough understanding may be gained by referring to the drawings and description of the preferred embodiment which follow.