Conventional computing systems are based on Von Neumann or Harvard architectures. Both of these systems rely on a Central Processing Unit (CPU) to direct any processes occurring within the system. For the better part of the last four decades, computing, scaling from personal to data-center sized systems, has evolved around a CPU-centric motherboard, with everything else intended for data processing or storage, operating in the periphery of the CPU. This kind of centralisation has made CPU task-sharing (or multitasking) a necessity, introducing management complexities in modern operating systems and programming models, further causing important software overheads and limitations in per-task execution granularity and task-completion determinism to be considered as acceptable trade-offs. These later characteristics have also generated a schism among computing hardware designed for general use and hardware designed for data acquisition, control automation or other applications in need for real-time and predictable handling of computing loads. These CPU-centric computer architectures are heavily focussed on CPUs directing all processes within the computer and, as such, the system's processing speed is limited, in part, by the speed of the CPU. Furthermore such CPU centric computer architectures adopt an instruction driven computing approach following the traditional instruction fetch—decode—execute cycle that introduces significant power/energy overheads.
To address these challenges, more execution cores have been added to CPUs, for better handling of multitasking. Also non-CPU related hardware has been introduced for data handling and processing. This hardware has taken the form of graphics processors, expansion-bus interfaces and memory controllers integrated in modern CPUs, making them highly complicated Systems-on-Chip (SoC), while always remaining aligned to the general architectural principle of a CPU-centric motherboard system.
Even though this kind of integration has made low-power and mobile computing possible, it hasn't provided a realistic path forward for data centre and industrial scale computing which, contrary to mobile computing and, in part, in order to complement it by (cloud) abstraction, has had to grow and adapt to handling extremely diverse, complex and always evolving sets of computing loads in the most power efficient way.
At the same time, owing to the same technological advances that make complex, billion transistor CPUs possible, other, non-CPU computing components, in the form of Field Programmable Gate Arrays (FPGAs) and their non-reprogrammable Hardware Description language (HDL) counterparts: the Application Specific Integrated Circuits (ASICs), have grown in performance and efficiency too. Today's FPGAs come with unprecedented logic densities, clock speed and power efficiencies, while maintaining the benefit of being hardware level re-programmable (in interconnect level) in the field. The importance of these characteristics is best evidenced by the industry scale paradigm shift towards synthesisable hardware and the wide availability and commercial adoption of HDL/Intellectual Property cores (IP cores) set to address general purpose or specific (Graphics, DSP etc.) processing problems using FPGAs and ASICs.
FPGAs/ASICs, combined with mature HDL/IP cores, have reached performance levels that currently outperform CPUs in power, utilisation efficiency and sometimes even in raw computing performance. However, due to the CPU/Motherboard-centric model of current computer architecture, FPGAs/ASICs remain on the periphery of computing hardware, limited mostly to offering acceleration and off-loading services to the CPU.
High speed backplane/board-to-board interconnects (such as ISA, EISA, AGP, PCI and, more recently, PCIe) are designed for interconnecting computing capable peripherals with CPUs in a CPU-based computer architecture. Present implementations of FPGA/ASIC based computing hardware use PCIe interconnects to interface with the CPU and other critical, motherboard hosted, computing resources like memory and network interface controllers. However, these interconnects impose physical limitations to scaling (as interconnection points are limited by the size of the motherboard), and lack the mechanisms for supporting peer importance component interaction (such as uniform access to memory), introducing further constrains in resource accessibility and combined resource utilisation.
In such an arrangement, memory subsystems are not only used for facilitating data storage and retrieval, but also act as intermediate interfaces for heterogeneous computing hardware operating on a given set of data. Typically, a piece of data processed by a CPU, even when it is known that it will be passed to another computing element for further down-stream processing (e.g. GPU or FPGA/ASIC), will need to temporarily reside in memory and be read back by the next processing element. Ideally, but not always, this process will run with the CPU handing over memory control to the further processing element for accessing the memory directly (via direct memory access (DMA)), even though memory controller integration in modern (SoC) CPUs is blurring this line, and the extensive use of caching further raises various challenges on data coherence, synchronisation and avoiding performance penalties over such a transaction.
Attaching FPGAs/ASICs as peripheral equipment to CPU-centric computing systems hinders their performance and limits their importance in modern computing. FPGAs/ASICs are designed, at component and even internal circuitry level, to operate extremely efficiently in parallel while scale in numbers, but attachment points available to CPU-based systems, typically through expansion boards over PCIe slots, are very limited. This forces extensive hardware multiplexing, which does not permit efficient parallelization or scaling, to be used in order to orchestrate operation:                many FPGA/ASIC carrying expansion boards are multiplexed over the limited PCIe lanes available to the CPU (CPU motherboard level multiplexing);        many FPGA/ASICs, at expansion board level, are multiplexed over the PCIe lanes assigned to each expansion board.        
Such systems are complex to build and program, as well as power inefficient to run.
The present invention aims to reduce restrictions set by CPU-centric computing.