This document relates to distributing parallelism for parallel processing architectures.
A multicore processor is a microprocessor with multiple processor cores on a chip. Two trends in silicon technology have made this type of microprocessor increasingly attractive. First, transistor count is continuing to grow exponentially according to Moore's law, with a billion transistors within reach in the next few years. It has become increasingly difficult to come up with new and effective ways to use transistors to improve performance. Stamping out multiple cores is a simple, cost-effective, and efficient way to take advantage of these transistors. Second, long wires are becoming increasingly expensive. Multicore processors are able to control the growth of wires because they naturally keep the length of most of wires to within the length or width of a single core, independent of the total number of cores or transistors on the chip.
A billion-transistor chip with tens or hundreds of cores offers a large potential for performance gain, but actual performance gain will vary for different applications, as will the level of effort required to attain such performance. Except for a few massively parallel, multi-thread applications such as web servers, it is usually a difficult task to parallelize applications to take advantage of multiple cores.
An alternative to parallel programming is to automatically extract parallelism from a single-threaded program and exploit the parallelism on multiple cores. One convenient form of parallelism that can be exploited in this manner is instruction level parallelism (ILP). ILP can readily be found in varying amounts in a typical ordinary, single-threaded program. A compiler that is able to detect such ILP in ordinary programs can exploit it on the multiple functional units of a single-core processor.