1. Technical Field
The present disclosure relates to a router in a semiconductor system including a bus, a method for controlling the router, and a program.
2. Description of the Related Art
There are various methods for controlling the transfer in a data transfer system including a bus. FIGS. 53A and 53B each show an exemplary known transfer control method for a conventional semiconductor system. FIG. 53A shows an example of conventional centralized bus control. In the centralized bus control shown here, a plurality of bus masters and a memory are connected to each other with a single bus 910, and accesses to the memory by the respective bus masters are arbitrated by an arbiter 912. By adopting such a configuration, data can be transferred while traffic flow interference caused between the plurality of bus masters and the memory is avoided. However, as the functionality of an integrated circuit has been improved and as the number of cores in an integrated circuit has been increased, the scale of the circuit has become larger and the traffic flow through the transmission path has gotten more complicated. As a result, it has become increasingly difficult to design an integrated circuit by such a centralized bus control.
Meanwhile, semiconductor integrated circuits with distributed buses have been developed one after another lately by introducing, for example, connection technologies in parallel computers and/or ATM (Asynchronous Transfer Mode) network control technologies. FIG. 53B shows an example of distributed bus control. In a semiconductor integrated circuit with distributed buses, a plurality of routers (R) are connected to each other with multiple buses in a mesh. Recently, researchers have been working on a so-called “Network on Chip (NoC)” in which traffic flows in a large-scale integrated circuit are transferred through a plurality of distributed buses by adopting the distributed bus control as shown in FIG. 53B.
FIG. 54 shows an exemplary configuration for a router 920 for use in an NoC, a parallel computer, an ATM network, and so on. The data to be transferred (i.e., traffic) is divided into a plurality of small units such as packets or cells, each of which is transmitted to a destination node by way of multiple routers. The router 920 shown in FIG. 54 includes a plurality input ports #0 through #3, a plurality of input buffers 922, a plurality of output ports #0 through #3, a crossbar switch 924 for connecting the input buffers and the respective output ports, and an arbiter 912 for switching the connection by the crossbar switch. Data input to the router 920 through the input ports #0 through #3 is temporarily stored on the input buffers 922.
The input buffers 922 each include a plurality of buffer queues. In the example shown in FIG. 54, each input buffer 922 has two virtual channels (VC0, VC1) as buffer queues. The arbiter 912 performs a routing process of analyzing the received data and determining the output port to be used. The arbiter 912 also defines a correspondence between the buffer queues of the input buffers in the router which is a transfer destination to which the data is to be transferred and the output ports, and performs a scheduling process for transmitting the data from the buffer queues. The router 920 may have a configuration including output buffers each having a plurality of buffers on a stage after the crossbar switch 924. In such a configuration, the arbiter 912 defines a correspondence between the buffer queues of the output buffers and the output ports. These processes are performed based on information which indicates priority levels defined by the data, or by a round-robin method of allocating data to the buffer queues in the order of process requests. The data is routed based on the scheduling result and is transferred to a router or a bus master which is a transfer destination. Each router 920 switches the connection by the crossbar switch 924 in accordance with the scheduling process and thus routes data stored on the input buffers 922 to the respective destination.
In the router, a transfer process of traffic may be delayed due to, for example, a wait at each buffer or a process delay at the crossbar switch. In an application in which such a delay is required to be suppressed as much as possible (e.g., application of notifying emergency information), such a delay needs to be decreased. As the scale of the network becomes larger, the number of routers is increased, and thus the problem of delay becomes more serious.
The problem of delay occurring in the router is conspicuous for a traffic of data transferred with no break (bursty traffic), such as video data or the like. FIG. 55 shows an example of bursty traffics. The horizontal axis represents the time, and the vertical axis represents the amount of data transferred. In the example shown here, a traffic continuous for a certain time duration is generated, and then once stops. Then again, a traffic continuous for a certain time duration is generated. Such bursty traffics are likely to occupy buffer queues in the router and thus exert a strong influence on the other traffics. For this reason, while a bursty traffic is transferred, a traffic jam is likely to occur to cause a transfer delay.
As a specific example, it is assumed that a packetized bursty traffic is stored on each of the buffer queues by a round-robin allocation method. When a throughput is decreased in another router on a path reaching a destination node due to the interference between the bursty traffic and another traffic, transfer is not performed smoothly from any of all the buffer queues storing the bursty traffics. As a result, the throughput of each buffer queue is decreased, and thus the transfer performance of the entire system is decreased.
As a measure against this problem, Japanese Laid-Open Patent Publication No. 2002-344509 discloses a method of allocating traffics having high priority levels to predetermined buffer queues at the time of designing. Hiroki MATSUTANI, Michihiro KOIBUCHI, Hideharu AMANO, and Tsutomo YOSHINAGA, “Evaluations of Prediction Router for Low-Latency On-Chip Networks”, Technical Report of the Institute of Electronics, Information and Communication Engineers 2009-ARC-181, PP. 1-6, January 2009 (hereinafter referred to as “Non-patent Document 1”), and John KIM, “Low-Cost Router Microarchitecture for On-Chip Networks”, MICRO '09, Dec. 12-16, 2009 (hereinafter referred to as “Non-patent Document 2) each disclose a measure against a delay caused by both of a wait at the buffer queues and a process delay at the crossbar switch. Non-patent Document 1 discloses various methods of decreasing the delay. According to these methods, the routing process is performed in parallel, or a part of the process is skipped, so that the transfer process by the router is simplified or is performed at a higher speed. Non-patent Document 2 discloses a method of omitting the transfer process in the router to decrease the delay in the transfer process.
FIG. 56 shows a schematic configuration of a router 940 disclosed in Non-patent Document 2. In the router 940, a part of data input to an input port passes a bypass line 930 and is output from an output port, without passing an input buffer 922 or a crossbar switch 924. Owing to such a configuration, the transfer process in the router 940 is omitted. Therefore, the data passing the bypass line 930 is transferred more quickly than data transferred by a usual process.
In this manner, the transfer delay in the router can be decreased by omitting at least a part of the routing process, allocating a part of the traffics to a buffer queue with priority, or performing the routing process in parallel. As a result, a part of the data can be transferred with priority. In this specification, decreasing the delay in the transfer process by omitting at least a part of the routing process, allocating a part of the traffics to a buffer queue with priority, or performing the routing process in parallel as described above will be represented as “bypassing”. Also in this specification, a traffic which is bypassed may be referred to as a “bypass traffic”, and a traffic which is not bypassed may be referred to as a “non-bypass traffic”. A buffer queue in an input buffer or an output buffer may be referred to as a “data storage section”.