The following relates to the parallel computing arts, multi-core and multi-CPU computer arts, simulation arts, and so forth.
The heart of a computer is its central processing unit (CPU), which carries out the instructions of a computer program at the machine code level. The CPU executes machine code instructions in a synchronous manner in accordance with instruction cycle time intervals. Given a single CPU, program execution speed is dictated by the instruction cycle time interval, which can be reduced by designing the CPU with faster transistors by reducing transistor channel length, designing transistors with faster switching times, et cetera.
To further improve program execution speed, parallel processing can be employed. In parallel computing, two or more CPUs operate in parallel to execute the computer program. In theory, the execution speed could scale linearly with the number of CPUs in the multi-CPU system, i.e. speed could double for a two-CPU system, triple with a three-CPU system, and so forth. In practice, however, the speed benefit attained by parallel processing is usually much lower. Part of this is due to delays in transmitting signals between processors. These transmission delays can be reduced by monolithically integrating the CPUs on a single substrate (e.g. on a single silicon wafer in the case of typical silicon-based CPUs). When multiple CPUs are implemented on a single chip, the CPUs are sometimes referred to as “cores”, and the single-chip multi-CPU processor is referred to as a multi-core processor. The software can also be designed to minimize inter-CPU communication events.
Even with such improvements, the speed gain attained by multi-CPU (i.e. parallel) processing is usually still far less than the theoretical gain due to inefficiencies in the software design. To attain the maximum benefit from a multi-CPU design, every CPU should be executing useful instructions constantly during program runtime. This goal is not reached if one (first) CPU has to stop its processing (sometimes referred to as being in a “locked” state) while it waits for another (second) CPU to complete some task whose results are required by the first CPU in order to continue program execution.
One illustrative computing application that illustrates these issues is transportation network simulation. In a known approach, trips for agents (e.g. vehicles or travelers) are planned in so-called “micro-simulations”, and a main simulation process combines trip plans to simulate the overall transportation network. This simulation paradigm is readily adapted to a parallel computing environment by having one CPU handle the main simulation process while delegating the trip planning tasks to other CPUs. However, a bottleneck will arise anytime the main process requires a trip plan that has not yet been generated by a CPU executing in parallel. In such a bottleneck, the CPU executing the main process is locked until it receives the trip plan from the other CPU. If the transportation network simulation takes into account trip-altering events such as accidents, vehicle breakdowns, personal delays, or the like (provided as real-time real-world inputs, or simulated using a pseudo-random perturbance process), then some trip plans required by the main process will change immediately after occurrence of a trip-altering event, and new trips that accommodate the event will need to be planned, leading to lock of the CPU executing the main process.
This can be generalized to any parallel computing situation in which the main process encounters a decision node at which two or more different paths may be followed. Depending upon the decision, different tasks will be called for. As a result, there will be a delay before the CPUs operating in parallel can provide the CPU executing the main process with the task results called for by the decision, resulting in a bottleneck at the main process. Even more generally, this can arise anytime the process being executed by one CPU encounters a decision node and the subsequent paths use results of tasks being performed in parallel by other CPUs.
Disclosed herein are improved parallel computing techniques that overcome the aforementioned disadvantages and others.