1. Technical Field of the Invention
The technical field of this invention is data communication among plurality of data processing components, specifically involving structures that simultaneously route multiple data transfers in multiple dimensions.
2. Discussion of Prior Art
In the past, when the semiconductor manufacturing process was in the early stages of development, the number of transistors available to system designers was low and the cost of implementing data processing functions in silicon was high. During the 1980's, most data processing systems featured no more than one processor, memory and a few peripherals.
Typical data communication methods of early data processing systems were based on parallel bus structures connecting one processor with memory and a few peripherals. Parallel bus represented a 1-dimensional data communication structure in which all components were arranged in a line along the bus signals. A typical processor sequentially accessed memory and peripheral data by issuing read and write bus cycles to bus locations encoded in the bus address according to a linear (1-dimensional) memory map. Even though buses could only support one data transfer at a time, their data transfer bandwidth was sufficient due to relatively low performance requirements of early systems.
As the semiconductor manufacturing process has improved, the number of transistors available to designs has increased exponentially and the cost of implementing data processing functions in silicon has plummeted. Subsequently, system designers were no longer limited by how many processors or peripherals they could put in a design. During the 1990's, many data processing systems featured dozens of processors and peripherals, and in some cases the distinction between master processors and slave peripherals has been erased as increasing numbers of peripherals were implemented with processors. Some systems were built as data processing farms containing 2-dimensional arrays of interconnected data processing components.
While the physical and functional structures of many high performance systems of the early 2000's assumed forms of 2-dimensional arrays with hundreds of data processing components, the 1-dimensional bus structures of the 1960's continued to dominate the data communication methods used to interconnect these components. This created several problems.
Attaching additional components to a bus increases the length of the shared bus lines and the electrical loading that the components must drive to send data. These side effects slow down the bus speed, increase power consumption and complicate routing of signals between data processing components.
Adding master components to a bus also requires additional bus arbitration logic to arbitrate between simultaneous bus requests from multiple masters. Linking multiple buses with bus bridges requires additional bus arbitration logic. In addition to slowing down the data transfers, bus arbiters and bridges also introduce significant latency to all transfers resulting in system-wide loss of data transfer determinism. Not knowing the precise time needed for transfers to complete, system designers resigned to padding of available bus bandwidth with dead cycles to ensure on-time data delivery. While improving transfer determinism, this clumsy method wastes valuable bus cycles that could instead be used to improve system performance or to reduce cost.
Parallel bus communication could be compared to a city transportation system with one bus vehicle making frequent stops on a 2-way street.
Other data communication methods have been recently introduced for connecting arrays of components. Crossbars, transfer controllers and shared memory arrays are used to connect multiple components through central switching/storage structures. While in most cases faster then buses, crossbars have limited fanout, scalability and suffer from high pin counts and high power consumption. Crossbars, transfer controllers and shared memory often increase system component count by several large devices, making them prohibitively expensive for many designs.
Centralized communications can be compared to a city transportation system where cars travel on a road system resembling spokes of a wheel, with the ends of the spokes being the on and off ramps, and the hub being the single intersection point where cars can change their direction of travel.
Linear tunnels and rings connect adjacent data processing components with short point-to-point data links, stringing the components along a single line. While solving the electrical loading problem of a bus, linear tunnels do not improve data transfer latency over buses. The inherently poor latency performance of linear tunnels stems from the heavy sharing of individual links during most transfers. Tunnel latency is further degraded by inability of some tunnels to support peer-to-peer transfers, requiring every transfer to pass through one master, thus rendering them relatively useless for directly connecting, say, one hundred components.
Linear tunnel communication can be compared to a city transportation system consisting of one cab company with one dispatcher scheduling taxi rides on a single 2-way street.
Datapipes, like linear tunnels, are 1-dimensional data communications structures with left and right ports. Each left and right port can have multiple data communication links, some of which can be bent by 90 degrees to emulate other data communication structures like orthogonal mesh and tree topologies. While datapipe links, like buses or tunnels, can be oriented in any direction, the datapipe routing logic is limited to routing data in two directions of one dimension—left and right.
Datapipes cannot be directly accessed by slave peripherals to send or receive data. Instead, datapipes work like programmable processors using op-code driven receiver and transmitter masters to drive data transfers to and from local I/O memories, according to instructions fetched from those memories. Thus, system data processing components, being unable to communicate directly, have to first deposit instructions and data in I/O memory before the datapipes can transmit any data.
To receive data, components have to wait until their local receiver interprets the arriving op-codes to deposit incoming data in local I/O memory, before being able to read it. Receivers, transmitters and I/O memory represent additional elements that have to be added to each connected data processing component, thus significantly increasing system gate count and cost.
In addition, the 1-dimensional routing method used by the datapipes requires that each datapipe routing node must have detailed knowledge of relative locations of all other datapipe nodes in the left and right routing directions. This routing method requires a complex initialization process and tedious configuration changes to all datapipe routing nodes every time a component is added, removed or transferred to another location. Internally, datapipes use 1-dimensional addressing to route variable-sized data packets in left and right directions of one dimension, according to stored direction id codes for the left and right directions.
During initialization, sets of unique data routing directions are individually assigned at each node for every possible packet destination, as the datapipe routing logic does not use uniform routing methods to transport data between nodes. Accordingly, each datapipe routing node has no inherent knowledge of how data packets are routed by other datapipe nodes in the system.
Datapipe's non-deterministic routing methods combined with variable size of data packets increases worst-case transfer latency and renders all data transfers non-deterministic under heavy data loading conditions. At best, non-deterministic data transfers result in wasted data processing cycles as components are forced to wait for data. At worst, non-deterministic transfers can produce data gridlock conditions resulting in catastrophic system shutdowns.
Datapipe communication can be compared to a city transportation system with a grid of one-way streets and a policeman at every intersection giving motorists directions to the next intersection on their path.
Buses, central switches, linear tunnels and datapipes represent centralized and 1-dimensional data communications structures that are not distributed and uniform in multiple dimensions. They are therefore inherently not efficient for connecting large 2-dimensional arrays of components on chips or boards. They are even less suitable in providing data communications for 3-dimensional computing structures such as linear arrays of boards. Non multi-dimensional data communication structures create data communication choke points and result in high power dissipation congested routing and non-deterministic data transfer latencies.
Comparing system data communication to a city transportation system, small towns with less then a dozen households can be sufficiently served by one vehicle on a single street, or by multiple vehicles on several one-way streets with policemen scheduling car routes at every intersection. However, when a city grows beyond a certain point, traffic scheduling delays and gridlock conditions are inevitably bound to bring any centralized or 1-dimensional transportation system to a halt.
With shrinking semiconductor process features steadily increasing the number of data processing components inside designs, data communication structures that are not scalable and uniform in multiple dimensions are increasingly difficult to schedule and verify to the point where system verification is becoming a major design bottleneck.