Computing machinery is undergoing rapid evolution. Early electronic computers were generally entirely sequential processing machines, executing a stream of instructions, one-by-one, that together compose a computer program. For many years, electronic computers generally included a single main processor which was capable of rapidly executing a relatively small set of simple instructions, including memory-fetch, memory-store, arithmetic, and logical instructions. A computational task was addressed by programming a solution to the task as a set of instructions and then executing the program on a single-processor computer system.
Relatively early in the evolution of electronic computers, various ancillary and support tasks began to be moved, away from the main processor, to specialized auxiliary processing components. As one example, separate I/O controllers were developed for off-loading much of the repetitive and computational-bandwidth-consuming tasks associated with exchanging information between main memory and various external devices, including mass-storage devices, communications devices, display devices, and user-input devices. This incorporation of multiple processing elements into single-main-processor computer system was the beginning of a trend towards increasing parallelism in computing.
Parallel computation is currently a dominant trend in the design of modern computational machinery. At one extreme, individual processor cores often provide for concurrent, parallel execution of multiple instruction streams, and provide for assembly-line-like, concurrent execution of multiple instructions. Most computers, including personal computers, now incorporate at least two, and often many more, processor cores within each single integrated circuit. Each processor core can relatively independently execute multiple instruction streams. Electronic computer systems may contain multiple multi-core processors, and may be aggregated together into vast distributed computing networks comprising tens to thousands to hundreds of thousands of discrete computer systems that intercommunicate with one another and that each executes one or more separable portions of a large, distributed computational task.
As computers have evolved towards parallel and massively parallel computational systems, many of the most difficult and vexing problems associated with parallel computing have been found to be associated with decomposing large computational tasks into relatively independent subtasks, each of which can be carried out by a different processing entity. When problems are not properly decomposed, or when problems cannot be decomposed, for parallel execution, then employing parallel computer machinery often provides little or no benefit, and, in worst cases, may actually result in slower execution than can be obtained by a traditional software implementation executed on a single-processor computer system. When multiple computational entities contend for shared resources, or depend on computational results generated concurrently by other processing entities, enormous computational and communications resources may be expended to manage the parallel operation of the multiple computational entities. Often, the communications and computational overheads may far outweigh the benefits of a parallel-computing approach carried out on multiple processors or other computational entities. Furthermore, there may be significant financial costs involved with parallel computing, and also significant costs in power consumption and in heat dissipation.
Thus, although parallel computation appears to be the logical approach to efficient computing of many computational tasks, judging from biological systems and the evolutionary trends already encountered in the short time span of the evolution of electronic computers, parallel computing is also associated with many complexities, costs, and disadvantages. While many problems may theoretically benefit from a parallel-computing approach, the techniques and hardware for parallel computing that are currently available often cannot provide cost-effective solutions for many computational problems, particularly for complex computations that need to be carried out in real time within devices constrained by size constraints, heat-dissipation constraints, power-consumption constraints, and cost constraints. For this reason, computer scientists, electrical engineers, researchers and developers in many computationally oriented fields, manufacturers and vendors of electronic devices and electronic computers, and, ultimately, users of electronic devices and electronic computers all recognize the need for continued development of new approaches to efficient implementation of parallel computation engines for solving practical problems. In particular, computer scientists, electrical engineers, researchers and developers in many computationally oriented fields, manufacturers and vendors of electronic devices and electronic computers, and others seek efficient, low-power, and cost-effective subsystems that can be used within, or associated with, the parallel computation engines, including efficient, low-power, and cost-effective memory subsystems.