This invention relates to a processor array, and in particular to a large processor array which requires multi-bit, bidirectional, high bandwidth communication to one processor at a time, to all the processors at the same time or to a sub-set of the processors at the same time. This communication might be needed for data transfer, such as loading a program into a processor or reading back status or result information from a processor, or for control of the processor array, such as the synchronous starting, stopping or singlestepping of the individual processors.
GB-A-2370380 describes a large processor array, in which each processor (array element) needs to store the instructions which make up an operating program, and then needs to be controllable so that it runs the operating program as desired. Since the array elements pass data from one to another, it is essential that the processors are at least approximately synchronised. Therefore, they must be started (i.e. commence running their programs) at the same time. Likewise, if they are to be stopped at some time and then re-started, they need to be stopped at the same time.
Due to the large number of array elements, and the relatively large size of their instruction stores, data stores, register files and so on, it is advantageous to be able to load the program for each array element quickly.
Due to the size of the processor array it is difficult to minimise the amount of clock skew between each array element and, in fact, it is advantageous from the point of view of supplying power to the array elements to have a certain amount of clock skew. That is, it is necessary for the array elements to be synchronised to within about one clock cycle of each other.
For synchronous control of an array of processors, the simplest solution would be to wire the control signals to all array elements in a parallel fan-out. This has the limitation of becoming unwieldy once the array is larger than a certain size. Once the distance the signals have to travel is so long as to cause the signals to take longer than one clock cycle to reach the most distant array elements, it becomes difficult to pipeline the control signals efficiently and to balance the end-point arrival times over all operating conditions. This imposes an upper limit on the clock speed that can be used, and hence the bandwidth of communications. Additionally, this approach is not well suited to being able to talk to just one processor at a time in one mode and then to all processors at once in another mode.
For high bandwidth communications to multiple end-points, packet-switched or circuit-switched networks are a good solution. However, this approach has the disadvantage of not generally being synchronous at all the end-points. The latency to end-points further away is longer than to end-points that are close. This also requires the nodes of the network to be quite intelligent and hence complex.
It is also necessary to consider the issue of scaleability. A design that works well in one processor array may have to be completely redesigned for a slightly larger array, and may be relatively inefficient for a smaller array.