Computer systems are often used to perform complex numerical calculations. Applications processing such datasets are very time consuming due to the large amount of data that must be processed and complex operations that must be performed.
One approach to increase the speed of a computer system for specialist computing applications is to use additional or specialist hardware accelerators. These hardware accelerators increase the computing power available and concomitantly reduce the time required to perform the calculations.
A suitable system for performing such calculations is a stream processing accelerator having a dedicated local memory. The accelerator may be, for example, located on an add-in card which is connected to the computer via a bus such as Peripheral Component Interconnect Express (PCI-E) or may be connected over a network.
The bulk of the numerical calculations can then be handled by the specialized accelerator. Stream processor accelerators can be implemented using, for example, Field-Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs) and/or structured ASICs. Stream processors implemented as FPGAs generally provide much more computational power than a CPU and so are able to perform calculations more quickly than a CPU. In certain cases, such arrangement may increase the performance of highly parallel applications by over an order of magnitude or more.
It is also possible to scale such an arrangement up to a large number of CPUs working with a large number of stream processors. However, at large scale, it is necessary to manage the workloads between CPUs and the stream processors.
Load balancers are known in network systems. However, traditional load balancing systems sit directly in the path of the work flow from clients to servers. They receive work from clients and distribute them to the most under-utilized system which can handle the request.
This approach adds a significant amount of latency to the computational process. The work has to be received by the load balancer, decoded and handed to an appropriate worker resource.
A traditional load balancer must by necessity perform simple operations, since it must sustain the processing rate of all of the servers, and even so risks becoming a performance bottleneck. If the traditional load balancer wishes to add policies to enforce any relative quality of service for the clients then this adds further latency to the decisions.
A further complication is that it is not possible, using conventional computational arrangements, for multiple clients to share a single resource. This is because, in conventional arrangements, each client has a unique and specific data workload.
Therefore, to date, known arrangements for managing and sharing load on a network system are unsuitable and/or sub-optimal for use in high speed computational networks. This disclosure addresses this issue.