1. Technical Field
The present application relates to a technology for arranging a transmission schedule for a plurality of traffic flows that run through multiple routers, which are connected together through distributed buses, in a semiconductor integrated circuit including such a bus. More particularly, the present application relates to a technique for controlling the transmission of packets which are constituent units that form multiple different traffic flows and which are stored and distributed in a plurality of buffers.
2. Description of the Related Art
Portion (A) of FIG. 1 illustrates an example of a centralized bus control. In a traditional integrated circuit that performs such a centralized bus control, a number of bus masters and a memory are connected together with a single bus, and accesses to the memory by the respective bus masters are arbitrated by an arbiter. However, as the functionality of an integrated circuit has been further improved and as the number of cores in an integrated circuit has been further increased these days, the scale of the circuit has become even larger and the traffic flows running through the bus has gotten even more complicated. As a result, it has become increasingly difficult to design an integrated circuit by such a centralized bus control.
Meanwhile, semiconductor integrated circuits with distributed buses have been developed one after another lately by introducing parallel computerized connection technologies and network control technologies such as ATM (asynchronous transfer mode). Portion (B) of FIG. 1 illustrates an example of a distributed bus control. In a semiconductor integrated circuit with distributed buses, a number of routers are connected together with multiple buses. Recently, people have been working on a so-called “Network on Chip (NoC)” in which the traffic flows in a large-scale integrated circuit are transmitted through a number of buses by adopting the distributed buses such as the one shown in portion (B) of FIG. 1.
FIG. 2 illustrates generally a basic configuration for a router for use in the NoC, parallel computers, ATM network, and so on. In such a router, traffic data is divided into a number of small units such as packets or cells, each of which is transmitted to its destination node. The data that has been sent to the router is temporarily retained in buffers.
Also, in order to transmit a number of different packets in parallel with each other through each input port, a virtual channel (which is sometimes called a “VC”), in which multiple buffers are connected in parallel with each other, is provided for each input port. That is to say, each virtual channel substantively consists of multiple buffer memories for a router. In this case, a number of buffers may actually be physically arranged for and with respect to each input port. Alternatively, a virtual channel may also be provided even by managing the data on a single buffer memory as if there were multiple buffers there.
In addition, a crossbar switch is further arranged in order to determine an exclusive connection between each input port and its associated output port. The exclusive connection between an input port and its associated output port via the crossbar switch is also determined by an arbiter.
By getting the crossbar switch turned by the arbiter in this manner, the router relays the data that is retained in the buffers to a destination.
Next, it will be described how to change the connection between an input port of a router and its associated output port. Each input port of a router and its associated output port are connected exclusively with each other via the crossbar switch. In this description, the “exclusive connection” refers to a situation where when multiple input ports and multiple output ports need to be connected at a time, not more than one input port is connected to one output port.
Next, the data structure of data to be transmitted by a NoC router will be described.
FIG. 3 illustrates an exemplary transmission format for a packet 300 and how the packet 300 may be divided into multiple flits.
Unlike a router to be generally used in parallel computers and ATM networks, a NoC router transmits a packet which has been generated at a transmission node after having divided the packet into multiple units called “flits” of a size that is small enough to send it in one cycle through a bus.
The packet 300 includes a header field 301, a data field 302, and a control code field 303.
In the header field 301, described are the receiving end's (i.e., reception node's) address, the transmitting end's (i.e., transmission node's) address, information about the deadline by which a packet transmitted should arrive at the reception node (which will be referred to herein as “time information”), for example. The time information may be described in any form as long as the amount of time that has passed since one packet was transmitted or the deadline by which the packet should arrive at the destination can be compared to another packet's.
In the data field 302, on the other hand, video data or audio data may be described, for example. In the control code field 303, the end code of a predetermined packet 300 may be described, for example.
The processing of relaying the packet 300 and the processing of receiving the packet 300 at the receiving end are performed based on the reception node's address and the transmission node's address among the data stored in the header field 301.
Each node on the transmitting end transmits the packet 300 after having divided it into smaller data units called “flits”. In this case, one flit is data that can be transmitted through the bus in one cycle and its size is determined by the width of the bus. Among those flits obtained by dividing one packet 300, the flit to be transmitted first is called a “header flit” 304, to which flag information indicating that this flit is located at the beginning of a packet and information about the reception node's address of the packet are added.
It should be noted that the address information indicating the location of the reception node is not stored in any of the flits that follow the header flit 304. This is because those flits that follow the header flit 304 are supposed to be sent to the same destination as the header flit 304. When the destination is determined by the header flit 304 and when an output buffer to which the flits in that traffic are output is determined, the flits that follow the header flit 304 are transmitted to the destination indicated by the header flit 304 by using the same output buffer as what is used by the header flit 304.
On the other hand, the last flit of one packet is called a “tail flit” 306, to which added is flag information indicating that this is the last one of the flits that form one packet. Meanwhile, the flits other than the header flit 304 and the tail flit 306 are mainly used to transmit data and are called “data flits” 305.
On detecting the end code that is described in the control code field 303, the node on the receiving end restores those flits transmitted into the original packet based on that end code.
For example, one packet may have a size of 128 bytes, and one flit may have a size of 64 bits. In that case, one packet is transmitted after having been divided into 16 flits. It should be noted, however, that these sizes are just an example because one packet size and one flit size may vary according to the application or the bus width. Optionally, the length of a flit may be determined by a length that can describe control data using the reception node's address and the transmission node's address, for example.
FIG. 4A is a flowchart showing a series of processing operations to be performed by each router in a NoC in order to transmit a packet that has been received by the router to either an adjacent router or a reception node.
On the other hand, FIG. 4B illustrates a configuration for virtual channels which contribute to transmitting the flits as shown in FIG. 4A. In the example illustrated in FIG. 4B, each router has two input ports and two output ports, and each input port is provided with two virtual channels (VCs) to store flits.
One of the virtual channels of each input port gets connected to the output port by turning a crossbar switch SW, thereby transmitting the flits in the virtual channel through the output port. In FIG. 4B, the virtual channels VC1 and VC2 of only Input Port 0 are illustrated and the virtual channels of input port 1 are not illustrated for the sake of simplicity.
FIG. 4C shows the cycle-by-cycle state transitions of respective flits since a packet has been received at a router and until the packet is transmitted to either the next router or a reception node.
In order to relay a packet that has been divided into multiple flits to its destination, a router in a NoC carries out, on the flits received, all or part of routing computation (RC) processing, virtual channel allocation (VA) processing, switch allocation (SA) processing, and switch traversal (ST) processing (see, for example, W. Dally and B. Towles, “Principles and Practices of Interconnection Networks”, Morgan Kaufmann Publishers).
Hereinafter, the basic operation of a router in a NoC will be described with reference to the flowchart shown in FIG. 4A and the block diagram shown in FIG. 4B first. In the following description, the upstream router will be referred to herein as “Router A” and the downstream router will be referred to herein as “Router B” as shown in FIG. 4B.
First, in Step 401 shown in FIG. 4A, Router A determines whether or not there is any virtual channel VC with flits. If the answer is YES, the processing performed by Router A advances to the next processing step 402. Otherwise, this processing step 401 will be performed over and over again until the decision is made that there is a virtual channel with flits.
If there is any virtual channel VC with flits, Router A determines, in the next processing step 402, whether or not the first one of the flits of that virtual channel VC is a header flit. If the answer is YES, the processing performed by Router A advances to the next processing step 403. Otherwise, the processing performed by Router A jumps to a processing step 405.
In the example illustrated in FIG. 4B, Router A has one virtual channel with a header flit. Thus, the processing performed by Router A advances to the next processing step 403.
If the first flit of the virtual channel has turned out to be a header flit, Router A carries out routing computation (RC) processing in this processing step 403 by reference to the destination information that is described in that header flit. By performing the routing computation processing, Router A selects one output port that leads to the destination of the packet.
In the example illustrated in FIG. 4B, Router A selects Output Port 0, which is connected to Router B that leads to the destination, by performing the routing computation processing.
After the output port has been selected by performing the routing computation processing, the processing advances to the next processing step 404, in which it is determined in which virtual channel of the adjacent Router B the packet to be transmitted from Router A needs be stored.
In a NoC, a packet is relayed after having been divided into multiple flits. Also, the basic information that is required to perform the routing control is described in only the header flit. That is why if flits of two or more different packets were mixed in the same virtual channel, those flits could not be delivered to the correct destination or the flits that have been delivered to the destination could not be restored into the original packet in some cases.
Thus, to avoid such an unwanted situation, each router in the NoC gets each single virtual channel occupied by a particular packet since the header flit of that packet has passed through the virtual channel and until the tail flit of that packet passes through the virtual channel and prohibits flits of any other packet from passing through that virtual channel occupied.
For example, Router A shown in FIG. 4B allocates one unused virtual channel in Router B, to which the flits are going to be transmitted, to the virtual channel VC1 that stores the header flit, thereby performing the virtual channel allocation (VA) processing.
When the virtual channel allocation (VA) processing is done, the processing by Router A advances to the next processing step 405.
However, if there are no unused virtual channels in the adjacent Router B, then Router A retries this processing step 404 over and over again until any of the virtual channels of Router B becomes available to allow Router A to complete the virtual channel allocation (VA) processing.
In the example illustrated in FIG. 4B, Router A selects the virtual channel VC1 of Router B as a buffer to store the flits and allocates the virtual channel VC1 to its own virtual channel VC1 that stores the header flit.
When the routing computation (RC) processing and the allocation of the virtual channel to store the flits in the adjacent router (i.e., the VA processing) are finished, the processing by Router A advances to the next processing step 405, in which Router A waits until the flits stored are transmitted.
In the processing step 405, in order to transmit the flits from the virtual channel, Router A turns the crossbar switch to allocate its own virtual channel VC1 to its output port (this is the switch allocation (SA) processing).
If multiple virtual channels are requesting to transmit flits through a single output port, then switch allocation (SA) processing is carried out in order to determine, on an output port basis, which virtual channel is allowed to transmit flits through a given output port.
Optionally, in this switch allocation processing, a router of the NoC may adjust, by reference to various kinds of information including the type of a given packet (such as a delay guaranteed type or a best effort type), the priority, the time of transmission, and the deadline for arrival, how long a virtual channel that stores the packet may be connected to an output port. Then, the schedule for transmitting packets from multiple different transmission nodes can be arranged.
When an output port is allocated and connected to the virtual channel that is waiting until the flits are ready to be transmitted as a result of the switch allocation (SA) processing step 405, the processing by Router A advances to the next processing step 406, in which Router A transmits the flits in the virtual channel connected through the output port selected (this is the switch traversal (ST) processing).
By performing this series of processing steps 401 through 406 on each virtual channel in this manner, the router transmits the flits received to the destination.
Hereinafter, it will be described with reference to FIG. 4C exactly how Router A shown in FIG. 4B relays a single packet. In the following example, it will be described how the flits are processed at respective points in time on the supposition that each packet is comprised of four flits.
First of all, when a header flit arrives at Router A at a time 1, Router A carries out routing computation (RC) processing by reference to the destination information included in the header flit, thereby selecting an output port through which flits will be transmitted next.
Next, at a time 2, Router A carries out virtual channel allocation (VA) processing, thereby determining to which virtual channel of Router B, to which the output port that has been selected through the routing computation processing (RC) is connected, the virtual channel that stores the header flit should be connected.
Meanwhile, at this point in time 2, Data Flit 1 arrives at Router A. However, as the header flit is already present at the top of the virtual channel, no processing is carried out on Data Flit 1.
Next, at a time 3, Router A carries out switch allocation (SA) processing, thereby determining which output port is allocated to the virtual channel that stores the header flit at the top.
Meanwhile, at this point in time 3, Data Flit 2 also arrives at Router A. However, as the header flit is already present at the top of the virtual channel, no processing is carried out on Data Flits 1 and 2.
Next, at a time 4, Router A transmits the header flit (which is the switch traversal (ST) processing). However, even after the header flit has been transmitted, the switch allocation (SA) processing is carried on to keep the same virtual channel-output port pair connected continuously.
Meanwhile, at this point in time 4, Tail Flit also arrives at Router A. However, as Data Flit 1 is already present at the top of the virtual channel, no processing is carried out on Data Flit 2 and Tail Flit.
Next, at a time 5, Router A transmits Data Flit 1 (which is the switch traversal (ST) processing). However, even after Data Flit 1 has been transmitted, the switch allocation (SA) processing is carried on to keep the same virtual channel-output port pair connected continuously. As Data Flit 1 is already present at the top of the virtual channel, no processing is carried out on Data Flit 2 and Tail Flit.
Next, at a time 6, Router A transmits Data Flit 2 (which is the switch traversal (ST) processing). However, even after Data Flit 2 has been transmitted, the switch allocation (SA) processing is carried on to keep the same virtual channel-output port pair connected continuously.
Finally, at a time 7, Router A transmits Tail Flit (which is the switch traversal (ST) processing). In this manner, one packet can be relayed completely.
This series of processing steps to be carried out to relay a single packet as shown in FIG. 4C is an example in which each processing step is supposed to be finished in one cycle. However, if it has turned out as a result of the virtual channel allocation (VA) processing that there is no virtual channel available from Router B to which a flit is going to be output, the virtual channel allocation (VA) cannot be completed until there is any virtual channel available. As a result, this routing processing should wait until this and following flits are ready to be transmitted.
The same can be said about the switch allocation (SA) processing. That is to say, if multiple virtual channels request to transmit flits through the same output port, then the transmission schedule needs to be arranged so that the output port is allocated to those virtual channels one by one. As a result, transmission of those flits should wait in such a situation.
Generally speaking, a larger number of transmission buffers (or virtual channels) can be secured for a packet of the same size in a router for use in a parallel computer or ATM than in a router for use in a NoC. That is why the influence of such a delay in the virtual channel allocation (VA) processing due to the shortage of virtual channels is less significant in the former type of router than in the latter. Meanwhile, it is far more important for the router of the former type to optimize the transmission schedule in order to transmit the respective flits in the transmission buffer (virtual channel) as efficiently as possible. That is why as for a router for use in a parallel computer or an ATM, a “wavefront allocator” which searches for the best possible combination of a transmission buffer (virtual channel) and an output port, “parallel iterative machining” which chooses the best combinations iteratively on the input port and output port sides of a router, and other techniques have been adopted (see W. Daily and B. Towles, “Principles and Practices of Interconnection Networks”, Morgan Kaufmann Publishers, for example).
On the other hand, in some situation, multiple virtual channels may request to be connected to the same output port at the same time. To cope with such a situation, a so-called “age-based” method has been proposed in U.S. Pat. No. 6,674,720, for example. According to that method, a value called “age” is defined based on the length of the time that passed since a packet was transmitted and the number of hops that the packet has made in order to maintain the order in which a number of packets have been transmitted and to minimize an increase in time delay between the packets or their difference. And according to the “age-based” method, a packet with the maximum (or minimum) age is supposed to be transmitted first.