Systems on silicon show a continuous increase in complexity due to the ever increasing need for implementing new features and improvements of existing functions. This is enabled by the increasing density with which components can be integrated on an integrated circuit. At the same time the clock speed at which circuits are operated tends to increase too. The higher clock speed in combination with the increased density of components has reduced the area which can operate synchronously within the same clock domain. This has created the need for a modular approach. According to such an approach the processing system comprises a plurality of relatively independent, complex modules. In conventional processing systems the systems modules usually communicate to each other via a bus. As the number of modules increases however, this way of communication is no longer practical for the following reasons. On the one hand the large number of modules forms a too high bus load, and the bus constitutes a communication bottleneck as it enables only one device to send data to the bus.
A communication network forms an effective way to overcome these disadvantages. Networks on chip (NoC) have received considerable attention recently as a solution to the interconnect problem in highly-complex chips. The reason is twofold. First, NoCs help resolve the electrical problems in new deep-submicron technologies, as they structure and manage global wires. At the same time they share wires, lowering their number and increasing their utilization. NoCs can also be energy efficient and reliable and are scalable compared to buses. Second, NoCs also decouple computation from communication, which is essential in managing the design of billion-transistor chips. NoCs achieve this decoupling because they are traditionally designed using protocol stacks, which provide well-defined interfaces separating communication service usage from service implementation.
Introducing networks as on-chip interconnects radically changes the communication when compared to direct interconnects, such as buses or switches. This is because of the multi-hop nature of a network, where communication modules are not directly connected, but are remotely separated by one or more network nodes. This is in contrast with the prevalent existing interconnects (i.e., buses) where modules are directly connected. The implications of this change reside in the arbitration (which must change from centralized to distributed), and in the communication properties (e.g., ordering, or flow control), which must be handled either by a intellectual property block (IP) or by the network.
Most of these topics have been already the subject of research in the field of local and wide area networks (computer networks) and as an interconnect for parallel machine interconnect networks. Both are very much related to on-chip networks, and many of the results in those fields are also applicable on chip. However, NoC's premises are different from off-chip networks, and, therefore, most of the network design choices must be reevaluated. On-chip networks have different properties (e.g., tighter link synchronization) and constraints (e.g., higher memory cost) leading to different design choices, which ultimately affect the network services.
NoCs differ from off-chip networks mainly in their constraints and synchronization. Typically, resource constraints are tighter on chip than off chip. Storage (i.e., memory) and computation resources are relatively more expensive, whereas the number of point-to-point links is larger on chip than off chip. Storage is expensive, because general-purpose on-chip memory, such as RAMs, occupy a large area. Having the memory distributed in the network components in relatively small sizes is even worse, as the overhead area in the memory then becomes dominant.
Off-chip networks typically use packet switching and offer best-effort services. Contention can occur at each network node, making latency guarantees very hard to offer. Throughput guarantees can still be offered using schemes such as rate-based switching or deadline-based packet switching, but with high buffering costs. An alternative to provide such time-related guarantees is to use time-division multiple access (TDMA) circuits, where every circuit is dedicated to a network connection. Circuits provide guarantees at a relatively low memory and computation cost. Network resource utilization is increased when the network architecture allows any left-over guaranteed bandwidth to be used by best-effort communication.
A network on chip (NoC) typically consists of a plurality of routers and network interfaces. Routers serve as network nodes and are used to transport data from a source network interface to a destination network interface by routing data on a correct path to the destination on a static basis (i.e., route is predetermined and does not change), or on a dynamic basis (i.e., route can change depending e.g., on the NoC load to avoid hot spots). Routers can also implement time guarantees (e.g., rate-based, deadline-based, or using pipelined circuits in a TDMA fashion). More details on a router architecture can be found in, A router architecture for networks on silicon, by Edwin Rijpkema, Kees Goossens, and Paul Wielage, In PROGRESS, October 2001.
The network interfaces are connected to an IP block (intellectual property), which may represent any kind of data processing unit or also be a memory, bridge, etc. In particular, the network interfaces constitute a communication interface between the IP blocks and the network. The interface is usually compatible with the existing bus interfaces. Accordingly, the network interfaces are designed to handle data sequentialisation (fitting the offered command, flags, address, and data on a fixed-width (e.g., 32 bits) signal group) and packetization (adding the packet headers and trailers needed internally by the network). The network interfaces may also implement packet scheduling, which can include timing guarantees and admission control.
On-chip systems often require timing guarantees for their interconnect communication. Therefore, a class of communication is provided, in which throughput, latency and jitter are guaranteed, based on a notion of global time (i.e., a notion of synchronicity between network components, i.e. routers and network interfaces), wherein the basic time unit is called a slot or time slot. All network components usually comprise a slot table of equal size for each output port of the network component, in which time slots are reserved for different connections and the slot tables advance in synchronization (i.e., all are in the same slot at the same time). The connections are used to identify different traffic classes and associate properties to them.
A cost-effective way of providing time-related guarantees (i.e., throughput, latency and jitter) is to use pipelined circuits in a TDMA (Time Division Multiple Access) fashion, which is advantageous as it requires less buffer space compared to rate-based and deadline-based schemes on systems on chip (SoC) which have tight synchronization.
At each slot, a data item is moved from one network component to the next one, i.e. between routers or between a router and a network interface. Therefore, when a slot is reserved at an output port, the next slot must be reserved on the following output port along the path between an master and a slave module, and so on.
When multiple connections with timing guarantees are set up, the slot allocation must be performed such that there are no clashes (i.e., there is no slot allocated to more than one connection). The task of finding an optimum slot allocation for a given network topology i.e. a given number of routers and network interfaces, and a set of connections between IP blocks is a highly computational-intensive problem (NP complete) as it involves finding an optimal solution which requires exhaustive computation time.
It is therefore an object of the invention to provide an improved slot allocation in a network on chip environment.
This object is achieved by an integrated circuit according to claim 1 and a method for time slot allocation according to claim 16 as well as a data processing system according to claim 17.
Therefore, an integrated circuit comprising a plurality of processing modules and a network arranged for coupling said modules is provided. Said integrated circuit further comprises a plurality of network interfaces each being coupled between one of said processing modules and said network. Said network comprises a plurality of routers coupled via network links to adjacent routers. Said processing modules communicate between each other over connections using connection paths through the network, wherein each of said connection paths employ at least one network link for a required number of time slots. At least one time slot allocating unit is provided for computing a link weight factor for at least one network link in said connection path as a function of at least one connection requirement for said at least one network link, for computing a connection path weight factor for at least one connection path as a function of the computed link weight factor of at least one network link in said connection path, and for allocating time slots to said network links according to the computed connection path weight factors.
Accordingly, a time slot allocation based on the actual connection requirement can be implemented.
According to an aspect of the invention said connection requirements comprise bandwidth, latency, jitter, priority and/or slot allocation requirements of the connection path. The time slot allocation can be implemented and optimized according to one of the specific connection requirements.
According to an aspect of the invention, said at least one time slot allocating unit is adapted to allocate time slots to said network links in decreasing order of connection path weight factor. Therefore, those network links requiring more time slots are considered first during the time slot allocation as these connections have more constraints, and, therefore, if left at the end, have less chances to find free slots. As opposed to that, shorter channels going through less utilized links, have more freedom in finding slots, and can thus be left toward the end of the slot allocation.
According to an aspect of the invention, said at least one time slot allocating unit is adapted to compute said connection path weight factor based on said computed link weight factors, the length of said connection path, and the bandwidth, latency, jitter, and/or the number of time slots required for said connection path. Therefore, the length of the connection path and the required amount of time slots may also be considered while computing the connection path weight factor.
According to a further aspect of the invention, said at least one time slot allocating unit is adapted to compute the connection path weight factor based on said computed link weight factors, the length of said connection path, and the bandwidth, latency, jitter, and/or the number of time slots required for said connection path weighted by a first, second and third weight factor, respectively. The contribution of the length of the connection path and the required bandwidth, latency, jitter, and/or time slots as well as the link weight factors may be varied by adapting the respective weight factors.
According to an aspect of the invention, at least one time slot allocation unit is arranged in at least one of said plurality of network interface and comprises a first time slot table with entries specifying connections to which time slots are allocated to. Said routers can also comprise second time slot tables with entries representing reservations of time slots without specifying connections. The slot tables in the routers can be smaller as the information, to which a time slot is associated to, does not need to be stored in these slot tables. The information per slot can be smaller, however, the slot tables will probably end up being larger, because there are multiple ports on a router (slot table size may be #ports*#slots*information_per_slot).
According to an aspect of the invention, at least one time slot allocation unit is arranged in at least one of said plurality of network interface and comprises a first time slot table with entries specifying connections to which time slots are allocated to, and said routers comprise second time slot tables with entries comprising information for routing data in said network. Accordingly, as the routing information is stored in the routers, packet headers can be omitted leading to a higher throughput.
The invention also relates to a method for time slot allocation in an integrated circuit having a plurality of processing modules, a network arranged for coupling said modules and a plurality of network interfaces each being coupled between one of said processing modules. Said network comprises a plurality of routers coupled via network links to adjacent routers. The communication between processing modules is performed over connections using connection paths through the network, wherein each of said connection paths employ at least one network link for a required number of time slots. A link weight factor for at least one network link in said connection path is computed as a function of connection requirements for said network links. A connection path weight factor for at least one connection path is computed as a function of the computed link weight factors of at least one network link in a connection path and connection requirements of the said connection path. Time slots are allocated to said links according to the computed connection path weight factors.
The invention also relates to a data processing system comprising a plurality of processing modules and a network arranged for coupling said modules. Said integrated circuit further comprises a plurality of network interfaces each being coupled between one of said processing modules and said network. Said network comprises a plurality of routers coupled via network links to adjacent routers. Said processing modules communicate between each other over connections using connection paths through the network, wherein each of said connection paths employ at least one network link for a required number of time slots. At least one time slot allocating unit is provided for computing a link weight factor for at least one network link in said connection path as a function of connection requirements for said at least one network link, for computing a connection path weight factor for at least one connection path as a function of the computed link weight factors of at least one network link in said connection path and connection requirements of the said connection path, and for allocating time slots to said network links according to the computed connection path weight factors.
Accordingly, the time slot allocation may also be performed in a multi-chip network or a system or network with several separate integrated circuits.
The invention is based on the idea to perform the slot allocations by computing a link weight as a function of the bandwidth, latency, jitter, and/or numbers of slots requested for each channel, i.e. each connection path, using the link and by computing a channel weight as sum of the link weights used by the channel.
Other aspects of the invention are defined in the dependent claims.