Processor arrays that contain a number of separate but interconnected processor elements are known. One such processor array is the picoArray™ architecture produced by the applicant of the present application and described in International publication WO 02/50624. In the picoArray™ architecture, the processor elements are connected together by a proprietary bus that includes switch matrices.
The software description of a digital signal processing (DSP) system comprises a number of processes that communicate with point-to-point or point-to-multipoint signals. Each signal has a fixed bandwidth, known as its slot rate, which has a value that is a power of two in the range 2-1024, in units of the picoArray™ cycle. Thus, a slot rate of four means that slots must be allocated on the bus between a sending processor element and the receiving processor element(s) once every four system clock cycles.
A partitioning procedure can be used to allocate groups of processes to each of the processor arrays in the system. A placement procedure can be used to allocate each process to a specific processor element within its allocated processor array. A switching or routing procedure determines the multiplexing of the signals on to the physical connections of the bus in the processor array.
The placement and switching procedure takes a user's abstract design, which consists of processes and signals, and places each process onto a processor element on a picoArray™ and routes all of the signals using the switching matrix of the picoArray™. This procedure must be carried out in a way that maximizes the number of processor elements that can be used within a given picoArray™ and that minimises the length of the routing needed for the signals.
The placement and the routing steps are generally performed separately, for example a candidate placement is created and then the signals are routed using that placement.
The output of the placement and switching procedure is a “load file” which contains configuration data for a single picoArray™.
The present application is concerned with the procedure for routing the signals. Therefore, in the following, it is assumed that the placement procedure has been carried out, i.e. the mapping of the processes to the processor elements has been completed.
The proprietary bus used in picoArrays™ is a time division multiplexed (TDM) structure in which communication timing is determined at “compile time”. In other words, there is no dynamic arbitration.
The bus comprises a set of “switches” placed throughout the processor array, and these switches are either in-line with the processor elements (see FIG. 1 (a)), or offset (see FIG. 1(b)).
In-line switches are easier to use for placement and routing algorithms since the regularity makes it easier to compute distances between processor elements. With offset switches, each row of processor elements is connected to two rows of switches, and therefore it is possible to communicate between adjacent rows by only traversing one switch, whereas in-line switches require the traversal of two switches.
However, for offset switches, each processor element is connected to two bus connections and only one of these can be used to provide this single switch transfer. If that direction becomes blocked (perhaps by another signal) then the other direction must be used, and this requires the traversal of three switches. For in-line switches, the two possible directions both require the traversal of two switches.
Thus it is easier to predict “bus costs” before the routing is actually performed if in-line switches are used.
The routing procedure requires a tool that can determine the contents of routing tables within each of the switches that make up the picoBus structure from the signals that need to be routed. Each routing table consists of a set of entries that indicate the routing for each clock cycle. The set of entries are repeated every N clock cycles. In addition, it is possible for some of the entries to be repeated at a lower frequency to provide communications at lower rates, while reducing the size of routing tables that are required.
In currently available picoArrays™, N is 1024. This is implemented as a table of 124+(4×8) entries. The main part of the table, which comprises the 124 entries, is repeated once every 128 clock cycles. The 8 blocks of 4 entries are repeated every 1024 clock cycles and are known as the “hierarchical” entries.
Furthermore, as indicated above, the routing of signals has to handle two cases that are supported by the bus protocol, namely point-to-point communications and point-to-multipoint communications.