1. Technical Field
Methods and example embodiments described herein are generally directed to interconnect architecture, and more specifically, to network on chip system interconnect architecture.
2. Related Art
The number of components on a chip is rapidly growing due to increasing levels of integration, system complexity and shrinking transistor geometry. Complex System-on-Chips (SoCs) may involve a variety of components e.g., processor cores, DSPs, hardware accelerators, memory and I/O, while Chip Multi-Processors (CMPs) may involve a large number of homogenous processor cores, memory and I/O subsystems. In both systems the on-chip interconnect plays a key role in providing high-performance communication between the various components. Due to scalability limitations of traditional buses and crossbar based interconnects, Network-on-Chip (NoC) has emerged as a paradigm to interconnect a large number of components on the chip. NoC is a global shared communication infrastructure made up of several routing nodes interconnected with each other using point-to-point physical links. Messages are injected by the source and are routed from the source router to the destination over multiple intermediate routers and physical links. The destination router then ejects the message and provides it to the destination. For the remainder of the document, terms ‘components’, ‘blocks’ or ‘cores’ will be used interchangeably to refer to the various system components which are interconnected using a NoC. Without loss of generalization, the system with multiple interconnected components will itself be referred to as ‘multi-core system’.
There are several possible topologies in which the routers can connect to one another to create the system network. Bi-directional rings (as shown in FIG. 1(a)) and 2-D mesh (as shown in FIG. 1(b)) are examples of topologies in the related art.
Packets are message transport units for intercommunication between various components. Routing involves identifying a path which is a set of routers and physical links of the network over which packets are sent from a source to a destination. Components are connected to one or multiple ports of one or multiple routers; with each such port having a unique ID. Packets carry the destination's router and port ID for use by the intermediate routers to route the packet to the destination component.
Examples of routing techniques include deterministic routing, which involves choosing the same path from A to B for every packet. This form of routing is independent of the state of the network and does not load balance across path diversities which might exist in the underlying network. However, deterministic routing is simple to implement in hardware, maintains packet ordering and easy to make free of network level deadlocks. Shortest path routing minimizes the latency as it reduces the number of hops from the source to destination. For this reason, the shortest path is also the lowest power path for communication between the two components. Dimension-order routing is a form of deterministic shortest path routing in 2D mesh networks.
FIG. 2 illustrates an example of XY routing in a two dimensional mesh. More specifically, FIG. 2 illustrates XY routing from node ‘34’ to node ‘00’. In the example of FIG. 2, each component is connected to only one port of one router. A packet is first routed in the X dimension till it reaches node ‘04’ where the x dimension is same as destination. The packet is next routed in the Y dimension until it reaches the destination.
Source routing and routing using tables are other routing options used in NoC. Adaptive routing can dynamically change the path taken between two points on the network based on the state of the network. This form of routing may be complex to analyze and implement and is therefore rarely used in practice.
Software applications running on large multi-core systems can generate complex inter-communication messages between the various blocks. Such complex, concurrent, multi-hop communication between the blocks can result in deadlock situations on the interconnect. Deadlock occurs in a network when messages are unable to make progress to their destination because they are waiting on one another to free up resources (e.g. at buffers and channels). Deadlocks due to blocked buffers can quickly spread over the entire network, which may paralyze further operation of the system. Deadlocks can broadly be classified into network level deadlocks and protocol level deadlocks.
Deadlock is possible within a network if there are cyclic dependencies between the channels in the network. FIG. 3 illustrates an example of network level deadlock. In the example of FIG. 3, starting at a state with all buffers empty, the blocks initiate the message transfer of A→C, B→D, C→A and D→B simultaneously. Each block takes hold of its outgoing channel and transmits the message toward its destination. In the example of FIG. 3, each channel can hold only one message at a time. From this point on, each channel waits on the next channel to move the message further. There is a cyclic dependency in the wait graph and the network becomes deadlocked. Such network layer deadlock or low-level deadlocks can be avoided by construction using deadlock free routing or virtualization of paths.
Network end points may not be ideal sinks, i.e. they may not consume all incoming packets until some of the currently outstanding packets are processed. If a new packet needs to be transmitted during the processing of an outstanding packet, a dependency may be created between the NoC ejection and injection channels of the module. The dependency may become cyclic based upon the message sequence, position of components and routes taken by various messages. If the deadlock is caused by dependencies external to the network layer, this is called a high-level, protocol or an application level deadlock. In related art systems, most high level tasks involve a message flow between multiple modules on the NoC in a specific sequence. Such a multi-point sequence of intercommunication may introduce complex dependencies resulting in protocol level deadlock. The underlying cause of deadlock remains the channel dependency cycle introduced by the inter-dependent messages between multiple components. Independent messages from one end point to another on the network will not cause protocol level deadlocks; however, depending on the routing of such messages on the network, network level deadlocks are still possible in the system.
FIGS. 4(a), 4(b) and FIGS. 5(a) to 5(c) illustrate an example of protocol level deadlock. Consider an example of a three central processing unit (CPU) system connected to memory and cache controller through a crossbar. The cache controller's interface to the interconnect has a single First-In-First-Out (FIFO) buffer which can hold a maximum of three messages. Internally, the cache controller can process up to two requests simultaneously (and therefore process up to two outstanding miss requests to the memory).
At FIG. 4(a), all three CPUs send read requests to the cache controller.
At FIG. 4(b), read requests are queued in an input buffer to the cache controller from the crossbar.
At FIG. 5(a), the cache controller accepts two requests ‘1’ and ‘2’ from input buffer while the third request ‘3’ remains in the input buffer. ‘1’ and ‘2’ have a read miss in the cache, which in turn issues miss refill requests ‘m1’, ‘m2’ to the memory
At FIG. 5(b), the memory returns refill data ‘d1’, ‘d2’. This data gets queued behind ‘3’ in the cache controller's input buffer.
At FIG. 5(c), the cache controller waits for refill data for the outstanding requests before accepting new request ‘3’. However the refill data is blocked behind this request ‘3’. The system is therefore deadlocked.
In this system, deadlock avoidance can be achieved by provisioning additional buffer space in the system, or using multiple physical or virtual networks for different message types. In general, deadlock is avoided by manually 1) interpreting the intercommunication message sequence and dependencies, 2) then allocating sufficient buffers and virtual and/or physical channels and 3) assigning various messages in the sequence the appropriate channel.
In large scale networks such as the internet, deadlocks are of a lesser concern. Mechanisms such as congestion detection, timeouts, packet drops, acknowledgment and retransmission provide deadlock resolution. However such complex mechanisms are too expensive in terms of power, area and speed to implement on interconnection networks where the primary demands are low latency and high performance. In such systems, deadlock avoidance becomes a critical architectural requirement.