During applications code development, the development team traverses a repetitive development cycle shown below hundreds if not thousands of times:                1. Building code—compile and link a version of applications code        2. Loading code—loading the code into real hardware system or a software model        3. Debugging/Profiling code—chasing correctness or performance problems        4. Making changes—making source code edits, or changing the linker directives        
The load and change portions of this cycle are generally viewed as non-productive time, as one is either waiting for code to download from the host to the target system or looking through files that need changes and making changes with a text editor.
Any trip through the loop can either introduce or eliminate bugs. When bugs are introduced, the development context changes to debug. When sufficient bugs are eliminated, the development context may change to profiling. There are obviously different classes of debug and profiling, some more advanced than others. Profiling can involve code performance, code size and power. The developer bounces between the concentric rings of the development context, as the applications code development proceeds.
Special emphasis must be placed on getting to the developer the system control, data transfers, or instrumentation applicable to the current debug or profiling context. This requires packaging the system control and instrumentation in readily accessible systems solutions form, where developers can easily access tools with capabilities targeting specific development problems. The presentation of capabilities must expose the complete capability of the toolset while making the selection of right capability for the task at hand straightforward.
The need for emulation has significantly increased with the introduction of cache based architectures. This increased need primarily arises from the fact that on flat memory model architectures such as the Texas Instruments TMS320C6200 family of devices, the performance that can be expected from running on the target system could be accurately modeled with a simulator. The actual system performance with interrupts and Direct Memory Access (DMA) was within 10-15% of the simulated performance. This margin was reasonable for most applications of interest.
With the introduction of cache based architectures and the inability to model cache events and their impact on system performance accurately, today's developers find simulated performance to be anywhere from 50-100% away from the actual target system performance. This inaccuracy results in a loss of confidence about the capabilities of the device and leads to fictitious performance de-rating factors between cache and flat memory performance. While some of the discrepancy between simulated and actual performance is due to inadequate modeling of the cache, there still exists a fundamental problem in modeling system related interactions such as interrupts or DMA accurately. Hence simulators typically have tended to play catch up with the target system in modeling the system accurately. The period over which the simulator for a given target system matures is unfortunately the same time that a developer is attempting to get to market.
Visibility into what the target system is doing is key to extracting performance on cache-based architectures. The way to get this visibility for profiling system performance is through emulation. Visibility is also key for those writing behavioral simulators to countercheck the behavior of the target system against what is expected. It is key to software developers in helping to reduce cache related stalls that impact performance. Visibility on the target system is invaluable for system debug and development of applications in a timely manner. The absence of visibility leaves software developers with little else but to speculate about the probable reasons for loss of performance. The inability to know what is going on in the system leads to a trial and error approach to performance improvement that is gained by optimal code and data placement in memory. The lack of proper tools that allow for cache visualization precludes one from answering the question “Is this the most optimal software implementation for this target system?” The ability to know if a given software module ever missed real-time in an actual system is of utmost importance to system developers who are bringing up complex systems. Such questions can be only accurately answered by the constant and non-intrusive monitoring of the actual system that advanced emulation offers.
Visibility is key in aiding complex system debug. Debugging memory corruption and being able to halt the CPU when such a corruption is detected is of primary importance, as memory exceptions are not currently supported on Texas Instruments TMS320C6000 family targets. In addition on the Texas Instruments TMS320C6000 family Digital Signal Processor (DSP) data memory corruption can also result in program memory corruption causing the CPU execution to crash, as program and data share a unified memory. There is therefore a need to accurately trace the source code that is causing this malicious behavior. The ability to monitor Direct Memory Access (DMA) events, their submissions and completions relative to the CPU will provide additional dimensions to the programmer to tune the size of the data sets the algorithm is working on for more optimal performance. The ability to catch and warn users about spurious CPU writes or DMA writes to memory can prove to be invaluable in cutting down the software debug time. Advanced emulation features once again hold the key to all these critical capabilities. The need for good visibility only gets more serious with the introduction of multiple CPU cores moving forward. The need to know which CPU currently has access to a shared common data resource will be a question of prime importance in such scenarios. The detection and warning of possible memory incoherence is another critical capability that emulation can offer.
The new emulation features will provide enhanced debug and profiling capabilities that allow users to have better visibility into system and memory behavior. Further, several usability issues are addressed.
The aim is to make new debug and profiling capabilities available and fix problems encountered in previous implementations:                Stall cycle profiling to identify parts of the user application that requires code optimization.        Event profiling to analyze system and memory behavior which in turns allows to choose effective optimization methods.        Cache viewer and coherence analysis to debug cache coherence problems.        Software Pipelined Loop instruction (SPLOOP) Debug.        Support for Memory protection and security        Reduce Real-time Data Exchange intrusiveness.        Richer set of Advanced Event Triggering events.        