Moore's Law says that the number of transistors we can fit on a silicon wafer doubles every year or so. No exponential lasts forever, but we can reasonably expect that this trend will continue to hold over the next decade. Moore's Law means that future computers will be much more powerful, much less expensive, there will be many more of them and they will be interconnected.
Moore's Law is continuing, as can be appreciated with reference to FIG. 1, which provides trends in transistor counts in processors capable of executing the x86 instruction set. However, another trend is about to end. Many people know only a simplified version of Moore's Law: “Processors get twice as fast (measured in clock rate) every year or two.” This simplified version has been true for the last twenty years but it is about to stop. Adding more transistors to a single-threaded processor no longer produces a faster processor. Increasing system performance must now come from multiple processor cores on a single chip. In the past, existing sequential programs ran faster on new computers because the sequential performance scaled, but that will no longer be true.
Future systems will look increasingly unlike current systems. We won't have faster and faster processors in the future, just more and more. This hardware revolution is already starting, with 2-8 core computer chip design appearing commercially. Most embedded processors already use multi-core designs. Desktop and server processors have lagged behind, due in part to the difficulty of general-purpose concurrent programming.
It is likely that in the not too distant future chip manufacturers will ship massively parallel, homogenous, many-core architecture computer chips. These will appear, for example, in traditional PCs and entertainment PCs, and cheap supercomputers. Each processor die may hold fives, tens, or even hundreds of processor cores.
In a multicore system, processors may store and read data from any number of cache levels. For example, a first cache may be accessed and modified by only a single processor, while a second cache may be associated with a small group of processors, and a third cache is associated with a wider group of processors, and so on. A problem with such a configuration is that cache access becomes dramatically more expensive, in terms of processor clock cycles, as caches are farther away from the accessing processor. A search for desired data in a “level one” cache can be conducted relatively quickly, while a search of a “level two” cache requires much more time, and a “level three” search may require a relatively enormous amount of time, when compared to the time necessary for level one or level two searches. Therefore, tailoring the amount of time spent on memory access is a problem that will increasingly emerge in the computing industry.