Computing systems perform tasks by executing machine-readable instructions having a specific, known format to that computing system. Such a format, or instruction set architecture, defines the types of instructions that the computing system is capable of performing, and defines an expected output structure for results of that execution. This can include, for example, the set of commands used by the computing system, the memory addressing schemes used, data bit definitions, and various other features.
It can be important, when creating software to be executed on a particular computing system, to determine the expected performance of that software on the computing system. For example, if a particular software system must complete its execution in a prespecified amount of time, it may be important to know if such execution is reasonable. Furthermore, by assessing an amount of time it takes, either on average on in a specific instance, for different instructions or instruction types to execute, a designer of either the hardware or software system can isolate and improve upon performance issues in the hardware or software.
Furthermore, increasingly, computing systems are being used on a time-shared basis, in particular for enterprise-class, large-scale computing tasks, or for tasks For example, many cloud computing systems operate on a time-shared basis in which a particular compute load and data bandwidth is provided to a particular user or client. In these cases especially, it can be important to know the expected performance of a particular workload, to know what computing resources are required for its execution, and how best to apportion resources to different workloads. In cases where computing resources are leased to third parties for use, it is important for the lessor of computing resources to know the performance of specific physical computing resources for different types of workloads, so that lessor knows what is being leased and its value relative to the workload (e.g., whether the resources being leased have a competitive advantage over unshared, privately-managed systems of the would-be lessee, or competitor leased systems.
Software systems currently exist which are designed to provide such performance assessments. For example, VTune software from Intel Corporation of Sunnyvale, Calif., provides an analysis of the particular hardware instructions that are executed, and their frequency of execution and average execution time. VTune also provides an ability to perform an instruction trace of the hardware instructions that are performed based on execution of a particular software package. Additionally, for hardware performance, computing systems exist which are capable of emitting statistics regarding the native machine instructions executed and time of execution, as well as many other metrics such as number of instructions retired in a given amount of time, cache hit rates and other caching statistics, memory usage, and other metrics. However, such native hardware software systems are not without disadvantages.
In one example where performance tuning systems are inadequate is in the case of virtualized systems. Virtualized systems generally refer to systems which have some type of virtualization layer executing on the physical hardware resources, and which can be used to allow for greater flexibility in execution of software packages on those same physical hardware resources. This can allow such systems to execute software packages written for a different instruction set architecture incapable of direct execution on the available hardware resources (i.e., a “non-native instruction set architecture”), or simply written to be executed at a higher abstraction layer as compared to being written for direct execution on the available hardware resources, such systems, the virtualization layer, which can take the form of a hypervisor or other translation system, defines the manner of executing the hosted software package on the physical computing resources. This virtualization layer can be written simply as a translator, or in some cases can virtualize a different, non-native instruction set architecture, including memory model, addressing, instruction set, register set, and other typical architectural features. This may be the case in current cloud-based computing systems which can host workloads written for specialized processors on commodity hardware systems.
In such cases, existing performance assessment packages do not provide adequate focus on performance assessment, because it is not possible for such systems to distinguish between instructions performed by the virtualization layer as “overhead” or housekeeping tasks, and those instructions performed by the virtualization layer that are directly associated with a particular instruction. Furthermore, because at different times the virtualization layer may translate hosted (e.g., non-native) instructions differently, there may be no direct correspondence between one or more hosted instructions and one or more native instructions that are directly executed on the computing system. Furthermore, a virtualization layer may cause the virtualized or hosted instructions to be retired out of order, particularly if no data dependencies between those instructions exist. Additionally, in time-sharing situations, it may be the case that a particular resource is in use by a different virtualized software system, or may be available in hardware but not used by a particular virtualization layer, or may trigger an interrupt required to be handled by either a host system or the virtualization layer, thereby changing execution performance of the hosted software system between execution runs. Finally, it could also be the case that incorporation of a performance assessment features into virtualization software itself could adversely affect performance of the virtualization software by introducing unnecessary overhead into the translation and emulation process.
Accordingly, in view of the varying way in which execution of virtualized or hosted software can occur on a hosted and time-shared system, existing systems lack features capable of assessing instruction-level performance of the hosted software, and in particular the efficiency of execution of instructions in a hosted, non-native (but virtualized) system.
For these and other reasons, improvements are desirable.