1. Field of the Invention
This invention relates generally to servers and more particularly to a processor architecture and method for serving data to client computers over a network.
2. Description of the Related Art
With the networking explosion brought along with introduction of the Internet, there has been an increasing number of server applications that have multiple threads for serving multiple clients. Electronic commerce has created a need for large enterprises to serve potentially millions of customers. In order to support this overwhelming demand, the serving applications have different memory characteristics than the memory characteristics for desktop applications. In particular, the serving applications require large main memory bandwidth and have relatively poor cache behavior in order to accommodate a large number of clients.
In addition, conventional processors focus on instruction level parallelism to increase performance. Therefore, the processors tend to be very large and the pipeline is very complex. Consequently, due to the complexity of the pipeline for processors, such as INTEL processors, only one core is on the die. Accordingly, when there is a cache miss to main memory or some other long latency event, such as branch miss prediction, there is usually a stall that causes the pipeline to sit idle. As a result, serving applications which have large memory footprints and poor cache locality and branch predictability tend to have very little instruction level parallelism per thread. Thus, the characteristics of implementation for conventional processors with the application of server workloads result in a poor hardware utilization and unnecessary power dissipation since conventional processors focus on instruction level parallelism.
Additionally, the performance of processors based on instruction level parallelism, as a function of die size, power and complexity, is reaching a saturation point. FIG. 1 is a graph depicting the relationship between the performance and the power/size of conventional processors based upon instruction level parallelism. As illustrated by line 100 of FIG. 1, the increase in power and size of conventional processors does not provide a simultaneous linear increase in performance, due to the constraints of the instruction level parallelism (ILP) architecture. Conventional ILP processors include well known processors from the PENTIUM™, ITANIUM™, POWER™, ULTRASPARC™, etc., families.
In view of the forgoing, there is a need for a processor having an architecture better suited for serving applications in which the architecture is configured to exploit multi-thread characteristics of serving applications.