1. Field of the Invention
The present invention relates to a technology for arranging a transmission schedule for a number of traffics that are flowing through multiple relay devices, each of which connects a plurality of buses together, in a semiconductor integrated circuit with distributed buses.
2. Description of the Related Art
FIG. 1(A) illustrates an example of a centralized bus control. In a conventional integrated circuit that performs such a centralized bus control, a number of bus masters BMs and a memory MEM are usually connected together with a single bus, and accesses to the memory by the respective bus masters are arbitrated by an arbiter. However, as the functionality of an integrated circuit has been further improved and as the number of cores in an integrated circuit has been further increased these days, the scale of the circuit has become even larger and the flow of traffics through the bus has gotten even more complicated. As a result, it has become increasingly difficult to design an integrated circuit by such a centralized bus control.
Meanwhile, semiconductor integrated circuits with distributed buses have been developed one after another lately by introducing parallel computerized connection technologies and network control technologies such as ATM (asynchronous transfer mode). FIG. 1(B) illustrates an example of a distributed bus control. In a semiconductor integrated circuit with distributed buses, a number of relay devices R are connected together with multiple buses. Recently, people have been working on a so-called “Network on Chip (NoC)” in which the traffics in a large-scale integrated circuit are transmitted through a number of buses by adopting the distributed buses such as the one shown in FIG. 1(B).
FIG. 2 illustrates generally a basic configuration for a relay device for use in the NoC, parallel computers, ATM network, and so on. In such a relay device, traffic data is divided into a number of small units such as packets or cells, each of which is transmitted to its destination node. The data that has been sent to the relay device is temporarily retained in buffers.
Also, in order to transmit a number of different packets in parallel with each other through each input port, a virtual channel (which is sometimes called a “VC”), in which multiple buffers are connected in parallel with each other, is provided for each input port. That is to say, each virtual channel substantively consists of multiple buffer memories for a relay device. In this case, a number of buffers may actually be physically arranged for and with respect to each input port. Alternatively, a virtual channel may also be provided even by managing the data on a single buffer memory as if there were multiple buffers there.
In addition, a crossbar switch is further arranged in order to determine an exclusive connection between each input port and its associated output port. The exclusive connection between an input port and its associated output port via the crossbar switch is also determined by an arbiter.
By getting the crossbar switch turned by the arbiter in this manner, the relay device relays the data that is retained in the buffers to a destination.
Next, it will be described how to change the connection between an input port of a relay device and its associated output port. Each input port of a relay device and its associated output port are connected exclusively with each other via the crossbar switch. In this description, the “exclusive connection” refers to a situation where when multiple input ports and multiple output ports need to be connected at a time, not more than one input port is connected to one output port.
FIG. 3A illustrates how a connection request (transmission request) with respect to a particular output port is issued by an input port in a relay device. In this example, two virtual channels are provided for each input port. Virtual channels #0 and #1 of input port #0 request sending a packet to output ports #0 and #2, respectively. Virtual channels #0 and #1 of input port #1 request sending a packet to output ports #0 and #1, respectively. Virtual channels #0 and #1 of input port #2 request sending a packet to output ports #2 and #3, respectively. And virtual channels #0 and #1 of input port #3 request sending a packet to output ports #0 and #2, respectively.
The arbiter chooses a combination in which input and output ports are connected exclusively together from a number of connection requests from multiple input channels to the same output channel and turns the crossbar switch in accordance with its choice. As for the connection requests shown in FIG. 3A, the exclusive input and output port combinations chosen by the arbiter may be a combination of input port #0 and output port #2, a combination of input port #1 and output port #1, a combination of input port #2 and output port #3, and a combination of input port #3 and output port #0 as shown in FIG. 3B.
The greater the number of input and output port combinations that can be connected together simultaneously, the greater the number of packets that can be sent simultaneously through such exclusive connections between the input and output ports via the crossbar switch.
For that reason, parallel computers and ATM generally adopt a “wavefront allocator” method for searching all possible input and output port combinations for the best combination available or a “parallel iterative matching” method in which partial optimum solutions are obtained independently of each other on the input port and output port sides and iteratively, thereby attempting to increase the accuracy (see “Principles and Practices of Interconnection Networks”, W. Dally and B. Towles, Morgan Kaufmann Publishers (hereinafter referred to as “Non-Patent Document No. 1”), for example).
Meanwhile, a so-called “age-based” method has been proposed in U.S. Pat. No. 6,674,720. According to that method, if multiple virtual channels request connection to the same output port, a value called “age” is defined based on the length of the time that passed since a packet was transmitted and the number of hops that the packet has made in order to maintain the order in which a number of packets have been sent and to minimize an increase in time delay between the packets or their difference. And according to the “age-based” method, a packet with the maximum (or minimum) age is supposed to be sent first.
On the other hand, in an NoC, a number of relay devices need to be arranged on an integrated circuit, and therefore, the number or the size of virtual channels that can be processed by each relay device is smaller than that of a parallel computer or ATM network. According to the NoC, the size of one virtual channel is typically as large as one packet.
For that reason, according to the NoC, only a limited number of virtual channels should be used as efficiently as possible within a shorter time delay. For that purpose, it is important to control the transmission schedule so that the number of connections between input and output ports is maximized in not only each relay device but also the relay device on the receiving end as well.
On top of that, various constraints are imposed on those relay devices on the NoC in terms of the scale of the integrated circuit, the permissible time delay, and power dissipation. For that reason, it is not a good idea to apply an algorithm such as the Wavefront Allocator for searching a huge number of combinations for the best one or algorithm such as Parallel Matching Interater that requires iterative processing to each of those relay devices on the NoC as it is. If an ordinary relay device scheme that is currently used in parallel computers or ATM were applied as it is to a relay device on the NoC, then the circuit size, processing time, and power dissipation of an arbiter would also increase so much as to cause a decline in the performance of the NoC or a significant increase in processing time.
Hereinafter, this problem will be described in further detail.
FIG. 4 illustrates a specific example of the problem to be overcome by the present invention.
The relay device 401 shown in FIG. 4 is connected to four relay devices A, B, C and D, from which packets are sent out, through four input ports and receives those packets that have been sent from them. The relay device 401 is also connected to four other relay devices E, F, G and H, to which those packets should be sent, through four output ports, and forwards those packets to them.
Each of those input ports of the relay device 401 has two virtual channels so that each input port can issue transmission requests to at most two output ports.
However, if multiple relay devices on the transmitting end attempt to send packets to the same destination consecutively (Step 1) and if those packets are simply relayed right in their order of transmission as in the Age-Based method, then every virtual channel VC at each input port will be occupied with those packets to be sent to the same destination (Step 2). In that case, as multiple virtual channels VC attempt to get the same output port, some input port can get that output port successfully but another input port will fail to get it. And the latter input port cannot send the packets even if there is another output port available, thus deteriorating the transfer performance of the relay device (Step 3). Furthermore, once such a queue has been formed at the relay device 401, another queue will be formed at the relay devices on the transmitting end, too. In such a situation, even if there are packets that should be sent to different destinations from those of the packets in the queue, the former packets cannot be sent earlier than the latter packets in the relay device 401 (Step 4).
For example, in FIG. 4, suppose a situation where as the relay device 401 has received consecutively those packets to be sent to a few particular destinations from the relay devices on the transmitting end, packets to be sent to output port #0 are stored on every virtual channel of input ports #0 and #1 of the relay device 401 and packets to be sent to output port #2 are stored on every virtual channel of input ports #2 and #3 of the relay device 401. In that case, if every virtual channel issues a packet transmission request with respect to its output port, virtual channel #0 of input port #0 may get output port #0 and virtual channel #0 of input port #2 may get output port #2. Then, even though output ports #1 and #3 are still available, input ports #1 and #3 have no packets to be sent to those output ports, and therefore, will have no choice but to join the queue.
Also, even if any packet to be sent to output port #1 or #3 is stored on the relay device B or D on the transmitting end, that packet cannot be sent earlier than those packets that form the queue at the virtual channels of input ports #1 and #3 in the relay device 401.
If every virtual channel of each input port is occupied with particular packets in this manner, the transfer performance of the relay device will decline.
In parallel computers and ATM, however, the constraints on the number or size of virtual channels and on the time delay are less strict. That is why even if packets to be sent to the same destination have been received consecutively, such an unwanted situation where every virtual channel in the relay device is occupied with those packets to be sent to the same destination is less likely to arise. Furthermore, even if every virtual channel is occupied with those packets to be sent to the same destination, the permissible time delay of a parallel computer or ATM is still longer than the duration of such an occupied state, thus affecting the transfer performance to a lesser degree.
In the NoC, on the other hand, since relay devices are implemented on a semiconductor circuit, strict constraints are imposed on the number or size of virtual channels and on the time delay, and the number of virtual channels available in the relay device often gets short. As a result, the overall transfer performance of the NoC is seriously affected in such a situation.