The present invention relates to the field of performance measurement using designated logic within embedded systems.
Embedded systems are devices used to control, monitor or assist the operation of equipment, machinery or larger systems. The term xe2x80x9cembeddedxe2x80x9d reflects the fact that these components are an integral part of the overall system. In other words, the system incorporates its own embedded controller rather than relying upon a general-purpose computer to control the operation of a system.
All embedded systems are computer or control systems. Some of them are however very simple devices as compared with a general-purpose personal computer. The simplest devices each consist of a single microprocessor or microcontroller, which may be packaged with other chips in a hybrid or application specific integrated circuit (ASIC). The ASIC input comes from a detector or sensor and the ASIC output goes to a switch or activator which may, for example, start or stop the operation of a machine or perform some other operation.
An ASIC is a chip that is custom designed for a specific application rather than a general-purpose chip such as a microprocessor. The use of ASICs improve performance over general-purpose CPUs, because ASICs are xe2x80x9chardwiredxe2x80x9d to perform a specific task(s) and do not incur the overhead of fetching and interpreting stored instructions. An ASIC chip performs an electronic operation as fast as possible, providing, of course, that the circuit design is efficiently designed.
The very simplest embedded systems are each capable of performing only a single function or set of functions to meet a single predetermined purpose. In the more complex systems, the function of the embedded system is determined by an application program, which enables the embedded system to do things for a specific application. The ability to have different operating programs allows the same system to be used for a variety of different purposes. In some cases, a microprocessor may be designed in such a way that application software for a particular purpose may be added to the basic software in a second process, that may not be further changed. This particular application software is sometimes referred to as firmware.
Typically, an embedded system is housed on a single microprocessor board with the programs stored in ROM. Virtually all appliances that have a digital interface, such as watches, microwaves, VCRs, cars, etc., utilize embedded systems. Some embedded systems include an operating system, but many of them are so specialized that the entire logic may be implemented by a single program.
Firmware designers often have a need to fine-tune their software to a given target platform (e.g., a target embedded system). Such fine-tuning often involves the need to modify software in order to try and achieve desired results.
One such technique for fine-tuning the software involves the use of a benchmarking scheme for measuring performance in an embedded system. For example, the Dhrystone benchmarking software program (hereafter xe2x80x9cDhrystonexe2x80x9d), which was first developed by R. P. Wecker in 1984, is a benchmark test used to test the performance of embedded systems. Dhrystone is compact, widely available in the public domain, and easy to use. Dhrystone compares the performance of the processor under benchmark to that of a reference machine. Significant weaknesses exist with the use of Dhrystone. The results from Dhrystone tend to reflect the performance of the C compiler and libraries, more so than the performance of the processor itself.
The Dhrystone code is very compact (e.g., being of the order of around 100 high-level language statements and occupying just 1-1.5 kB of compiled code). Due to the small sized code, memory access beyond the cache is not exercised. Thus, Dhrystone simply tests the performance of the integer core. However, most processor cores include embedded cache memories, and the overall memory hierarchy and the way that the memory is managed heavily affect system design and performance. Benchmarking tools, such as Dhrystone, do not measure such improvements to memory management and system performance. The present invention recognizes the need and desire for a mechanism to assist in measuring actual microprocessor performance of an embedded system and not merely determine whether benchmarks for the embedded system have been achieved for certain criteria.
In one embodiment, the invention is a method for measuring an embedded system performance. The method includes initiating the embedded system. The method also includes loading instructions at the embedded system for reading a performance measurement mask. The method also includes executing the instructions and thereby causing the performance measurement mask to be read. The method further includes analyzing the performance measurement mask configuration to determine performance metrics to be measured. Moreover, the method includes performing a plurality of performance monitoring tasks on the embedded system according to the performance metrics to be measured. The performance metrics may be one or more the following exemplary metrics: overall execution time for a particular routine, number of instruction cycles executed in the particular routine, number of cache hits in the given routine; total number of memory reads in the given routine, total number of memory accesses (reads and writes) in the given routine, number of control bus read cycles in the given routine, number of control bus cycles (reads and writes) in the given routine, number of non-cacheable read cycles in the given routine, and total number of non-cacheable access cycles (reads and writes) in the given routine. Preferably, the performance metrics are recorded according to the status of control flags in a mask included within the embedded system. Based on these metrics, designers may fine-tune software for the embedded system.
In another embodiment, the invention is an embedded system that includes a microprocessor and performance measuring logic coupled to the microprocessor and configured to record selected performance metrics. The performance metrics may be one or more of the following: overall execution time of a particular routine, number of instruction cycles executed in the particular routine, number of cache hits in the given routine; total number of memory reads in the given routine, total number of memory accesses (reads and writes) in the given routine, number of control bus read cycles in the given routine, number of control bus cycles (reads and writes) in the given routine, number of non-cacheable read cycles in the given routine, and total number of non-cacheable access cycles (reads and writes) in the given routine. In general, a counter is configured to record statistics for each of the performance metrics, and the counters may be controlled using a programmable mask, which is included in a memory coupled to the microprocessor.