As computer and computer system architecture continues to evolve, the number of processing cores and threads within cores is increasing geometrically. This geometric increase is expected to continue, even for simple, relatively inexpensive computer system. For server systems, system sizes measured in the number of processors are increasing at an even faster rate.
Although this rapid increase in the number of cores and threads enhances the performance of computer systems, it also has the effect of making it difficult to apply the increasing parallelism to single applications. This limitation exists even for high-end processing tasks that naturally lend themselves to parallel processing, such as, for example, weather prediction. One of the major reasons for this limitation is that the number of communication paths between processors, cores, and threads increases disproportionately to the number of times the task is divided into smaller and smaller pieces. Conceptually, this problem can be analogized to the size of a processing being represented by the volume of a 3D cube. Each time this volume is divided into smaller cubes, the total surface area of the cubes, which represents data that must be communicated between the processors working on sub-cubes, increases. Every time that the number of processors goes up by a factor of eight the total amount of information to be communicated between the greater number of processors doubles.
One reason for these problems caused by increasing parallelism is that most systems communicate by sending messages between processors, rather than sharing memory. This approach results in high latencies and high software overheads, although it may simplify some complex system architecture, operating system, and compiler issues. Unfortunately, as the level of parallelism increases, the processors in the system reach the point where all they are doing is managing message traffic rather than actually doing useful work.
There is therefore a need for a system and method that can reduce software overhead and eliminate or at least reduce performance bottlenecks thereby improving system performance and architectural scalability at relatively low cost.