Microprocessors have evolved since their original creation, now boasting billions of transistors with picoseconds of switching speed. Over time they have gained enormous complexity and have been optimized for a particular set of applications such as speeding up existing programs run on common computing devices such as personal computers (PCs) and their more-powerful cousins, commercial file servers.
One particularly intense activity, employing large arrays of computers is the performance of searches power the World Wide Web (the “Web”) of the well-known Internet. Searching for particular content on the Web is an activity typically requested by a client computer and user, and delivered to a search service, such as the well known Google website. Google, and others like it, implement an algorithm known as a “search engine,” which efficiently sorts through the available list of interconnected web sites and their underlying content from Internet-connected sources all over the Web/world. Within fractions of a second of performing a search request, the search engine typically returns to the requesting client a voluminous (in many cases) listing of applicable sites and textual references.
As the volume of available content on the web and number of interconnected requesting clients have grown exponentially in recent years, search engines, such as Google, have risen to the challenge of organizing the billions of web pages (trillions of words) that make up the Internet. Such search engines have maintained (and attempted to improve) the responsiveness and quality of their search results. To do this they have created massive datacenters and filled them with hundreds of thousands of PC-based servers. They exploit the massive parallelism of their search algorithms by dividing up web pages amongst their servers so that each server is responsible for searching only a small piece of the Internet.
Search Engines are now building new datacenters because their old datacenters have reached the limits of their electricity usage (megawatts). These datacenters are very expensive and their construction would be unnecessary if the computer architecture used in their servers were more power efficient.
The ability to divide a search task into many pieces is somewhat unique to certain types of tasks, which require massive parallelism (e.g. simulation of brain circuits is also massively parallel). These algorithms often differ from the programs for which existing microprocessors have been optimized, namely those that involve serial dependencies of program steps, delivered through a pipeline. An example of a serial dependency would be where a+b=c, and c is needed to calculate f (e.g. e+c=f)—one must know c before calculating f. In the case of massively parallel algorithms, conversely, c's and f's are determined without regard to each other.
In various large scale data processing systems, there is an increasing need for tools to reduce power consumption. This is particularly a problem in large scale parallel processing systems, including high traffic web search data centers. The power consumption of some large scale web search data centers has approached or exceeded the power limits of their existing facilities. Accordingly, some of these web search data centers have been moved to facilities that can access more electrical power.
As newer processors manufactured by companies such as Intel of Santa Clara, Calif. have provided faster processor clock rates, and multiple processor cores, speed has increased, but with the concomitant increase in power-consumption, heat, and cost. With the cost of energy on the rise, simply increasing a processor's speed and/or adding additional server capacity may not be the best solution to increased volume in search activities and others involving mainly parallel operations.
However, the ability to divide the search task into many pieces is somewhat unique to certain types of tasks, which require massive parallelism (e.g. simulation of brain circuits is also massively parallel). These algorithms often differ from the programs for which existing microprocessors have been optimized, namely those that involve serial dependencies of program steps, delivered through a pipeline. An example of a serial dependency would be where a+b=c, and c is needed to calculate f (e.g. e+c=f)—one must know c before calculating f. In the case of massively parallel algorithms, conversely, c's and f's are determined without regard to each other.
It is highly desirable to provide improved computer architectures and methods for providing and using such architectures that provide sufficient speed performance in large scale parallel processing data centers, while maintaining or reducing their power consumption. Such architectures should allow conventional software and operating systems to be employed where possible so that serial tasks are still available, but those tasks involving parallel operations can be achieved at significantly increased performance thereby reducing the burden to employ more numerous, and more powerful processors to expand capacity.