1. Field of the Invention
The present invention relates to mechanisms for transferring data within a computing system. More specifically, the present invention relates to a method and an apparatus for routing data across an n-dimensional grid network.
2. Related Art
Dramatic increases in microprocessor clock speeds have not been matched by corresponding increases in chip-to-chip communication speeds. Consequently, inter-chip communication delays are rapidly becoming a major bottleneck to overall computer system performance. For example, it is now common for accesses to off-chip memory to require hundreds of processor clock cycles. This means that microprocessors are often stalled waiting for memory operations to complete.
The performance-limiting effects of these long inter-chip communications delays can be somewhat mitigated by providing large on-chip caches, and by providing mechanisms to support out-of-order execution, so that useful work can be accomplished during accesses to off-chip memory. However, as memory accesses begin to take hundreds of processor cycles to complete, even these large on-chip caches and out-of-order execution cannot keep a processor busy on typical processor workloads.
System developers are beginning to consider different interconnection topologies to provide fast chip-to-chip communication within computer systems. One promising topology is a two-dimensional grid network, wherein chips are coupled to four of their nearest neighbors (North, East, South, and West). The close proximity between chips in a two-dimensional grid facilitates high-throughput and low-latency communication between adjacent chips. Moreover, a two-dimensional grid network can be easily implemented with existing packaging technologies, which are well-suited to planar layouts. Unfortunately, existing routing mechanisms for two-dimensional grid networks can be quite complicated because they must deal with collisions, load-balancing issues and must avoid deadlock conditions.
Memory systems are a major limitation to performance in modern computer systems. Although the speed of processors has increased dramatically, the latency of memory access has not been reduced commensurately. The result is that computer processors spend relatively longer times waiting for a response from their memory systems. In a modern computing system, the processor may lose hundreds of potential machine cycles waiting for some piece of data from memory.
Hence, what is needed is a method and an apparatus that facilitates fast chip-to-chip communications within a computer system without the problems described above.