The present invention relates generally to high performance computer systems and more particularly to the automated design of such systems.
A vast number of devices and appliances ranging from mobile phones, printers, and cars have embedded computer systems. The number of embedded computer systems in these devices far exceeds the number of general purpose computer systems such as personal computers or servers. In the future, the sheer number of these embedded computer systems will greatly exceed the number of general purpose computer systems.
The design process for embedded computer systems is different from that for general purpose computer systems. There is greater freedom in designing embedded computer systems because there is often little need to adhere to standards in order to run a large body of existing software. Since embedded computer systems are used in very specific settings, they may be tuned to a much greater degree for specific applications. On the other hand, although there is greater freedom to customize and the benefits of customization are large, the revenue stream from a particular embedded computer system design is typically not sufficient to support a custom design.
In the past, there have been a number of attempts at automating the design of embedded computer systems. In one, a template-based processor design space was automatically searched to identify a set of best solutions. In another, a framework for the design of retargetable, application-specific, very long instruction word (VLIW) processors was developed. This framework provided the tools to trade off architecture organization and compiler complexity. A hierarchical approach was proposed for the design of systems consisting of processor cores and instruction/data caches where a minimal area system that satisfied the performance characteristics of a set of applications was synthesized.
Also in the past, there has been research focusing on the development of memory hierarchy performance models. Cache models generally assume a fixed trace and predict the performance of this trace on a range of possible cache configurations. In a typical application of the model, a few trace parameters are derived from the trace and are used to estimate misses on a range of cache configurations. For instance, models for fully associative caches employ an exponential or power function model for the change in work set over time. These models have been extended to account for a range of line sizes. Other models have been developed for direct-mapped caches, instruction caches, multi-level memory hierarchies, and multiprocessor caches.
One analytic cache model estimates the miss rate of set-associative caches using a small set of trace parameters derived from the address trace. In general, the accuracy decreases as the line size increase.
In designing embedded computer systems, the general design space consists of a processor and associated Level-1 instruction, Level-1 data, Level-2 unified caches, and main memory. The number and type of functional units in the processor may be varied to suit the application. The size of each of the register files may also be varied. Other aspects of the processor such as whether it supports speculation or predication may also be changed. For each of the caches, the cache size, the associativity, the line size, and the number of ports can be varied. Given a subset of this design space for an application and its associated data sets, a design objective is to determine a set of cost-performance optimal processors and systems. A given design is cost-performance optimal if there is no other design with higher performance and lower cost.
Because of the multi-dimensional design space, the total number of possible designs can be very large. Even with a few of the processor parameters, this easily leads to a set of 40 or more processor designs. Similarly, there may be 20 or more possible cache designs for each of the three cache types. This means that computer design systems currently require extensive and time-consuming calculations to design current computer systems.
The present invention provides an automatic and retargetable computer design system using a combination of simulation and performance prediction to investigate a plurality of target computer systems. A high-level specification and a predetermined application are used by the computer design system to provide inputs into a computer evaluator. The computer evaluator has reference computer system dependent and independent systems for producing a reference representation and dynamic behavior information, respectively, of the application. The reference representation and information are mated to produce further information to drive a simulator. The simulator provides performance information of a reference computer system. The reference system is the basis from which a desired target system is obtained. The performance information is provided to another computer evaluator, which has a target computer system dependent system for producing a target representation of the application for the plurality of target computer systems. The performance information and the target representation are used by a computer system predictor for quickly estimating the performance information of the plurality of target computer systems in a simulation efficient manner.
The present invention further provides an automatic and retargetable computer design system using simulation to investigate a plurality of reference computer systems. A high-level specification and a predetermined application are used by the computer design system to provide inputs into a computer evaluator. The computer evaluator has reference computer system dependent and independent systems for producing a representation and dynamic behavior information, respectively, of the application. The representation and information are mated to produce a further representation of the application to drive a simulator. The simulator provides performance information of the plurality of reference computer systems in a simulation efficient manner.
The present invention further provides an automatic and retargetable computer design system using performance prediction to investigate a plurality of target computer systems. A high-level specification and a predetermined application are used by the computer design system to provide inputs into a computer evaluator along with performance information of a reference computer system. The computer design system has a target computer system dependent system for producing a plurality of target representations of the application for a plurality of target computer systems. The performance information and the target representations are used by a computer predictor in the computer evaluator for quickly estimating the performance characteristics of the plurality of target computer systems in a simulation efficient manner.
The present invention further provides an automatic and retargetable computer design system using a combination of trace-driven simulation and performance prediction to investigate a plurality of target processors having different cache systems. A high-level specification and a predetermined application are used by the computer design system to provide inputs into a cache evaluator. The cache evaluator has a reference processor dependent assembler for producing relocatable object files and a linker for using the relocatable object files to produce a reference processor dependent executable file. The cache evaluator further has a reference processor independent system for producing an event trace of the application. The file and the trace are trace mated in a trace generator to produce an address trace to drive a cache simulator. The cache simulator provides performance characteristics of the reference computer system. The reference processor performance characteristics are provided to another cache evaluator, which has a target processor dependent assembler and linker for producing the executable files for the plurality of target processor systems. The reference processor performance characteristics and the executable files are used by a cache predictor for quickly estimating the performance characteristics, in terms of the number of misses or stall cycles, of the plurality of target processor systems in a simulation efficient manner uses.
The present invention further provides an automatic and retargetable computer design system using trace-driven simulation to investigate a plurality of reference processors having different memory systems. A high-level specification and a predetermined application are used by the computer design system to provide inputs into a cache evaluator. The cache evaluator has reference processor dependent assembler for producing relocatable object files and a linker for using the relocatable object files to produce a reference processor dependent executable file. The cache evaluator further has a reference processor independent system for producing an event trace of the application. The file and the trace are trace mated in a trace generator to produce an address trace to drive a cache simulator. The cache simulator provides the performance characteristics, in terms of the number of misses or stall cycles, of the plurality of reference processor systems in a simulation efficient manner uses.
The present invention further provides an automatic and retargetable computer design system using performance prediction to investigate a plurality of target processors having different memory systems. A high-level specification and a predetermined application are used by the computer design system to provide inputs into a cache evaluator along with the performance characteristics of a reference cache system. The cache evaluator has an assembler for producing relocatable object files and a linker for using the relocatable object files to produce an executable file. The performance characteristics and the executable files are used by a cache predictor for quickly estimating the performance characteristics, in terms of the number of misses or stall cycles, of the plurality of target processor systems in a simulation efficient manner.
The present invention provides an automated design process to derive the performance of the cache for a specific target processor, application, and cache. A separate design system is subsequently responsible for selecting the particular cache used at each level, based on the performance results provided by this process for a set of cache configurations.
The present invention further provides a computer design system that enables the evaluation of the various memory hierarchy components on a particular processor and application of interest.
The present invention still further provides a computer design system for general purpose systems. Often, architectural or compiler techniques are evaluated solely at the processor level without quantifying their impact on memory hierarchy performance. For instance, code specialization techniques, such as inlining or trace scheduling may improve processor performance, but at the expense of instruction cache performance. The present invention can also be used in these situations to quantify the impact on memory hierarchy performance in a simulation efficient manner.
The above and additional advantages of the present invention will become apparent to those skilled in the art from a reading of the following detailed description when taken in conjunction with the accompanying drawings.