Multiprocessor system on chip (MPSOC) and chip multiprocessor (CMP) infrastructures use bus structures for on-chip communication. However, traditional bus-based communication schemes lack scalability and predictability, and are not capable of keeping up with increasing demands of future system on chips (SOCs). To meet the challenges of next-generation system designs, an NOC infrastructure, which is structured and scalable, has been proposed.
A conventional NOC infrastructure consists of multiple interconnects, each comprising a compute unit (CU), a homogeneous node, and a network interface (NI). The NI at the homogeneous node transforms data packet(s) from its original format generated from the CU into NOC fixed-length flow-control digits (flits) suitable for transmission in the NOC. The NOC flits associated with a data packet(s) consist of a header (or head) flit, a tail flit, and a number of body flits in between. The NOC flits are routed from a source node of one interconnect towards a target node of another interconnect, in a hop-by-hop manner. For example, when a source CU sends a data packet(s) to a target CU the source CU first sends the data packet(s) to the NI associated with the source CU which transforms the data packet(s) into NOC flits. The NOC flits are transferred to a source node associated with the source CU, which subsequently routes the NOC flits to a target node of another interconnect associated with the target CU. The NOC flits travel in a hop-by-hop manner via links, which couple all the homogeneous nodes together within the NOC, from the source homogeneous node to any intervening homogeneous nodes between the source and target homogeneous nodes, until the NOC flits reach the target node. Upon receiving the NOC flits, the target node converts the NOC flits to the data packet(s) of the original format generated from the source CU, and the converted data packet(s) are sent to the target CU.
When at least one CU of the NOC is executing an application, the CU generates data packet(s) that include processed data as a result of executing the application. For example, if the CU is executing an image rendering application, the CU may generate data packet(s) that include a rendered image. When data packet(s) including such processed data flows within the NOC in the form of NOC flits, the NOC may experience heavy data traffic. One of ordinary skill in the art will recognize that the more hops the flits take to reach its intended destination, and the more CUs there are that generate additional data packet(s) for transfer, the NOC will experience even heavier data traffic and more power dissipation.
In a conventional 2-dimensional NOC infrastructure, a node has five input ports and five output ports corresponding to the north, south, east, and west directions, as well as its associated CU. Each port is coupled to another port on the neighboring node via a set of physical interconnect wires or channels. The node's function is to route NOC flits received from each input port to an appropriate output port and then toward a target node. To realize this function, the node is equipped with an input buffer for each input port, a crossbar switch to direct NOC flit traffic to the desired output port, and necessary control logic to route the NOC flits. The node may include a plurality of input queues to receive NOC flits from neighboring nodes. The node may also include a local input queue to receive NOC flits from its associated CU. An arbiter and router serve as control logic to route the NOC flits from any of the aforementioned queues to the target node. For each NOC flit, the corresponding head flit specifies its intended target node, and after examining the head flit, the arbiter and router determine which output direction to route all the subsequent (body and tail) flits associated with the NOC flit according to routing algorithms as known in the art. Specifically, the arbiter and router communicates with a crossbar switch, which directs NOC flit traffic to the desired output port for transmission of NOC flits to other nodes of the NOC. If the node is coupled to a memory segment, the NOC flits may further be written into the memory segment.
All nodes in the conventional NOC are homogeneous in terms of having the same components within the nodes. In contrast, the CUs in the conventional NOC are heterogeneous in terms of having different processors within the CUs. For instance, a CU of one interconnect may have one type of processor, and another CU of another interconnect may have a different type of processor.
As mentioned above, one of ordinary skill in the art will recognize that the more hops the flits take to reach its intended destination, and the more CUs there are that generate additional data packet(s) for transfer, the NOC will experience heavy data traffic due to the processed data flow, and thus will cause more power dissipation. To address the power associated with heavy traffic, current solutions include low voltage signaling and data compression. Particularly, if an application is executed across a plurality of CUs, any CU that is not needed to run the application can individually be turned on or off with low voltage signaling, thereby saving power of the overall NOC. In addition, nodes themselves may have fixed functions such as encryption to further encrypt the processed data originating from CUs. For example, nodes may be able to further compress the processed data originating from CUs to reduce the processed data flow experienced by the NOC. However, as the NOC grows in complexity due to the number of CUs increasing, there will still be substantial power dissipation associated with data movement. What is needed is an improved mechanism to increase performance of data movement and to reduce power dissipation associated with data movement.