Field
The present application relates to distributed computing, and in particular, to systems, methods and apparatus configured to enable distributed computing.
Background
The architecture of a typical modern computer is based on the von Neumann architecture. The von Neumann architecture is a basic design for a stored-program digital computer that includes a processor that is separated from a memory, which is used to store computer program instructions and data. Even though typical modern computer architectures are more complex than the original von Neumann architecture, typical modern computers retain the separation between the processor and the bulk memory.
Specifically, along with a number of other components, the processor and memory are provided on a printed circuit board, referred to as a motherboard (or “main board”, “system board” or “logic board”). On a typical motherboard, the processor and memory communicate via printed circuit data bus. Throughput, or data transfer rate, on the data bus is much lower than the rate at which a modern processor can operate. The difference between the data bus throughput and the processor speed significantly limits the effective processing speed of the computer when the processor is instructed to process large amounts of data stored in the memory. The processor is forced to wait for data to be transferred to or from the memory, leaving the processor under-utilized.
The performance limitation caused by separating the processor and the memory on a motherboard is referred to as the von Neumann bottleneck. The severity of the bottleneck tends to increase because over time processor speeds and memory sizes tend to increase at a faster rate than the improvements in throughput over the data bus connecting new processors to bigger memories. Previous attempts to alleviate the problem have only been partially successful. For example, previous hardware solutions that address the von Neumann bottleneck include providing a cache between the processor and the bulk memory, and/or providing separate caches with separate access paths for data and instructions; and, previous software solutions include branch predictor algorithms. However, none of the previous solutions fully address the problem.
Additionally, the performance degradations caused by the von Neumann bottleneck are exacerbated in a distributed computing environment in which computer processing of data is carried out by a number of processors operating simultaneously on smaller portions of a larger task. In a conventional distributed computing environment, such as a data center, multiple computers, each with a respective processor and memory, are coupled to one another. Typically, in an effort to reduce overhead, multiple motherboards are connected to one another within one cabinet. Each motherboard is subject to performance degradation caused by the von Neumann bottleneck even if some of the measures discussed above have been taken to alleviate the full impact of the problem. As such, delays in the processing of data caused by the respective data buses on the various motherboards are compounded, as processing service requests between processors are subject to delays on each motherboard that a service request is processed. Accordingly, there lies a challenge to alleviate delays so that memory access bottlenecks are not compounded within distributed computing systems.