The present invention concerns processor (e.g. data processor) systems with enhanced inter-communication and memory arrangements.
Current implementations of SIMD processors have local memory for each processing element (PE). This memory is normally private to each PE. In order to share data, the PEs either have to access data in a common shared memory or use some sort of inter-PE communication mechanism. Having two different types of memory complicates the programming model. Having to move data to shared memory, or between PEs, is a performance overhead.
The object of the present invention is to overcome these and other problems.
Prior Art
Various classes of architecture, including SIMD architectures with shared memory are already known. Two Types in particular are worth referring to:
Distributed memory systems: In this case each PE has its own associated memory. The PEs are connected by some network and may exchange data between their respective memories when required. In contrast to shared memory machines (see below) the user must be aware of the location of the data in the local memories and will have to move or distribute these data explicitly when needed. Our previous architecture (and most SIMD architectures) are of this form.
Shared memory systems: Shared memory systems have multiple PEs, all of which share the same address space. This means that the knowledge of where data is stored is of no concern to the user as there is only one memory accessed by all PEs on an equal basis. Single-CPU vector processors can also be regarded as an example of this.
The following papers describe routed inter-ALU networks, which are interconnects for distributing instructions to distributed ALUs and data to and from register files:
“Efficient Interconnects for Clustered Microarchitectures”; Joan-Manuel Parcerisa, Julio Sahuquillo, Antonio Gonzalez, and Jose Duato
“Routed Inter-ALU Networks for ILP Scalability and Performance”; Karthikeyan Sankaralingam, Vincent Ajay Singh, Stephen W. Keckler, and Doug Burger, Computer Architecture and Technology Laboratory, Department of Computer Sciences, Department of Electrical and Computer Engineering, The University of Texas at Austin
“Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture”; Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu; Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, Charles R. Moore, Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin.
Such networks connect a number of function units together via a distributed register file. Inter function unit and register file operations use this network to steer source and destinations of data driven by the instruction command. Thus the network ties the function units of a clustered ALU processor together, connecting function units to register files.
This approach differs from that described here, in that there is no distribution of instructions or connection of function units to register files with Applicant's ClearConnect Bus (“CCB”) network. Also the source and destination addresses are driven by the ALU and not statically by an instruction stream.
Certain problems with previous implementations of shared memory SIMD can be identified as follows, in that they:
require complex and non-scalable cross-bar or multi-port memory systems;
use central arbitration of accesses to memory, adding delay and complexity;
often limit the types of access allowed: e.g. all PEs access a fixed offset.
Reference may be made to some of our earlier patents and patent applications for further background and information concerning certain aspects of features of the present invention:
UK patents 2348974 (load/store), 2348984 (multitasking), 2348973 (scoreboarding)
UK patent applications: 0321186.9 (ClearConnect), 0400893.4 (multitasking), 0409815.3 (unified SIMD).