1. Technical Field
Methods and example implementations described herein are generally directed to interconnect architecture, and more specifically, to network on chip systems interconnect architecture.
2. Related Art
The number of components on a chip is rapidly growing due to increasing levels of integration, system complexity and shrinking transistor geometry. Complex System-on-Chips (SoCs) may involve a variety of components e.g., processor cores, DSPs, hardware accelerators, memory and I/O, while Chip Multi-Processors (CMPs) may involve a large number of homogenous processor cores, memory and I/O subsystems. In both systems the on-chip interconnect plays a role in providing high-performance communication between the various components. Due to scalability limitations of traditional buses and crossbar based interconnects, Network-on-Chip (NoC) has emerged as a paradigm to interconnect a large number of components on the chip. NoC is a global shared communication infrastructure made up of several routing nodes interconnected with each other using point-to-point physical links.
Messages are injected by the source and are routed from the source node to the destination over multiple intermediate nodes and physical links. The destination node then ejects the message and provides the message to the destination. For the remainder of this application, the terms ‘components’, ‘blocks’, ‘hosts’ or ‘cores’ will be used interchangeably to refer to the various system components which are interconnected using a NoC. Terms ‘routers’ and ‘nodes’ will also be used interchangeably. Without loss of generalization, the system with multiple interconnected components will itself be referred to as a ‘multi-core system’.
There are several possible topologies in which the routers can connect to one another to create the system network. Bi-directional rings (as shown in FIG. 1(a)), 2-D (two dimensional) mesh (as shown in FIG. 1(b)) and 2-D Taurus (as shown in FIG. 1(c)) are examples of topologies in the related art. Mesh and Taurus can also be extended to 2.5-D (two and half dimensional) or 3-D (three dimensional) organizations.
Packets are message transport units for intercommunication between various components. Routing involves identifying a path composed of a set of routers and physical links of the network over which packets are sent from a source to a destination. Components are connected to one or multiple ports of one or multiple routers; with each such port having a unique ID. Packets carry the destination's router and port ID for use by the intermediate routers to route the packet to the destination component.
Examples of routing techniques include deterministic routing, which involves choosing the same path from A to B for every packet. This form of routing is independent from the state of the network and does not load balance across path diversities, which might exist in the underlying network. However, such deterministic routing may be implemented in hardware, maintain packet ordering and may be rendered free of network level deadlocks. Shortest path routing may minimize the latency as such routing reduces the number of hops from the source to the destination. For this reason, the shortest path may also be the lowest power path for communication between the two components.
Dimension-order routing is a form of deterministic shortest path routing in 2-D, 2.5-D, and 3-D mesh networks. In this routing scheme, messages are routed along each coordinates in a particular sequence until it reaches the final destination. For example in a 3-D mesh network, one may first route along the X dimension until it reaches a router whose X-coordinate is equal to the X-coordinate of the destination router. Next, the message takes a turn and is routed in along Y dimension and finally takes another turn and moves along the Z dimension until it reaches the final destination router. Dimension ordered routing is often minimal turn and shortest path routing.
FIG. 2 pictorially illustrates an example of XY routing in a two dimensional mesh. More specifically, FIG. 2 illustrates XY routing from node ‘34’ to node ‘00’. In the example of FIG. 2, each component is connected to only one port of one router. A packet is first routed over the x-axis till the packet reaches node ‘04’ where the x-coordinate of the node is the same as the x-coordinate of the destination node. The packet is next routed over the y-axis until the packet reaches the destination node.
In heterogeneous mesh topology in which one or more routers or one or more links are absent, dimension order routing may not be feasible between certain source and destination nodes, and alternative paths may have to be taken. The alternative paths may not be shortest or minimum turn.
Source routing and routing using tables are other routing options used in NoC. Adaptive routing can dynamically change the path taken between two points on the network based on the state of the network. This form of routing may be complex to analyze and implement.
A NoC interconnect may contain multiple physical networks. Over each physical network, there may exist multiple virtual networks, wherein different message types are transmitted over different virtual networks. In this case, at each physical link or channel, there are multiple virtual channels; each virtual channel may have dedicated buffers at both end points. In any given clock cycle, only one virtual channel can transmit data on the physical channel.
NoC interconnects often employ wormhole routing, wherein, a large message or packet is broken into small pieces known as flits (also referred to as flow control digits). The first flit is the header flit, which holds information about this packet's route and key message level info along with payload data and sets up the routing behavior for all subsequent flits associated with the message. Optionally, one or more body flits follows the head flit, containing the remaining payload of data. The final flit is the tail flit, which in addition to containing the last payload also performs some bookkeeping to close the connection for the message. In wormhole flow control, virtual channels are often implemented.
The physical channels are time sliced into a number of independent logical channels called virtual channels (VCs). VCs provide multiple independent paths to route packets, however they are time-multiplexed on the physical channels. A virtual channel holds the state needed to coordinate the handling of the flits of a packet over a channel. At a minimum, this state identifies the output channel of the current node for the next hop of the route and the state of the virtual channel (idle, waiting for resources, or active). The virtual channel may also include pointers to the flits of the packet that are buffered on the current node and the number of flit buffers available on the next node.
The term “wormhole” plays on the way messages are transmitted over the channels: the output port at the next router can be so short that received data can be translated in the head flit before the full message arrives. This allows the router to quickly set up the route upon arrival of the head flit and then opt out from the rest of the conversation. Since a message is transmitted flit by flit, the message may occupy several flit buffers along its path at different routers, creating a worm-like image.
A standard n×m mesh NoC can connect n×m cores. The maximum latency of n×m mesh NoC is n+m−1 hops, when the hosts at the two far end corners inter-communicate. To minimize the latency n and m must be chosen to be as close as possible, creating a more square like topology. In this case, as the network scales in size, the maximum latency is on the order of n1/2, where n is the total number of nodes in the NoC. Using a taurus topology, latency can be further reduced.
Deadlock occurs in a system NoC interconnect when messages are unable to make forward progress to their destination because the messages are waiting on one another to free up resources (e.g. at buffers and/or channels). Deadlocks due to blocked buffers can quickly spread over the entire network, which may paralyze further operation of the system. Deadlocks can broadly be classified into network level deadlocks and protocol level deadlocks.
Deadlock is possible within a network if there are cyclic dependencies between the channels in the network. FIG. 3 illustrates an example of network level deadlock. In the example of FIG. 3, starting at a state with all buffers empty, the blocks initiate the message transfer of A→C, B→D, C→A and D→B simultaneously. Each block takes hold of its outgoing channel and transmits the message toward its destination. In the example of FIG. 3, each channel can hold only one message at a time. From this point on, each channel waits on the next channel to move the message further. There is a cycle in the channel or message dependency graph and the network becomes deadlocked. Such network level deadlock or low-level deadlocks can be avoided by construction using deadlock free routing or virtualization of paths using multiple virtual channels and keeping them from back pressuring each other.
Network end points may not be ideal sinks, i.e. they may not consume all incoming packets until some of the currently outstanding packets are processed. If a new packet needs to be transmitted during the processing of an outstanding packet, a dependency may be created between the NoC ejection and injection channels of the host. The dependency may become cyclic based upon the message sequence, position of components and routes taken by various messages. If the deadlock is caused by dependencies external to the network layer, this is called a high-level, protocol or an application level deadlock. In related art systems, most high level tasks involve a message flow between multiple hosts and ports on the NoC in a specific sequence. Software applications running on large multi-core systems often generate complex inter-communication messages between the various hosts and ports. Such a multi-point sequence of intercommunication may introduce complex dependencies resulting in protocol level deadlock in the system interconnect.
The underlying cause of deadlock remains some form of channel, buffer and message dependency cycle introduced by the inter-dependent messages between one or more ports of one or more hosts. Independent messages from one end point to another on the network do not cause protocol level deadlocks; however, depending on the routing of such messages on the network, network level deadlocks are still possible in the system.
FIGS. 4(a), 4(b) and FIGS. 5(a) to 5(c) illustrate an example of protocol level deadlock. Consider an example of a three central processing unit (CPU) system connected to memory and cache controller through a crossbar. The cache controller's interface to the interconnect has a single First-In-First-Out (FIFO) buffer which can hold a maximum of three messages. Internally, the cache controller can process up to two requests simultaneously (and therefore process up to two outstanding miss requests to the memory).
At FIG. 4(a), all three CPUs send read requests to the cache controller.
At FIG. 4(b), read requests are queued in an input buffer to the cache controller from the crossbar.
At FIG. 5(a), the cache controller accepts two requests ‘1’ and ‘2’ from input buffer while the third request ‘3’ remains in the input buffer. ‘1’ and ‘2’ have a read miss in the cache, which in turn issues miss refill requests ‘m1’, ‘m2’ to the memory
At FIG. 5(b), the memory returns refill data ‘d1’, ‘d2’. This data gets queued behind ‘3’ in the cache controller's input buffer.
At FIG. 5(c), the cache controller waits for refill data for the outstanding requests before accepting new request ‘3’. However the refill data is blocked behind this request ‘3’. The system is therefore deadlocked.
In this system, deadlock avoidance can be achieved by provisioning additional buffer space in the system, or using multiple physical or virtual networks for different message types. In general, deadlock is avoided by manually 1) interpreting the intercommunication message sequence and dependencies, 2) then allocating sufficient buffers and virtual and/or physical channels and 3) assigning various messages in the sequence the appropriate channel.
In large scale networks such as the internet, deadlocks are of a lesser concern. Mechanisms such as congestion detection, timeouts, packet drops, acknowledgment and retransmission provide deadlock resolution. However such complex mechanisms have substantial limitations (e.g., design cost) in terms of power, area and speed to implement on interconnection networks where the primary demands are low latency and high performance. In such systems, deadlock avoidance becomes a critical architectural requirement.