In complex multicore System on a Chip devices, data movement is often handled by peripherals or dedicated hardware engines such as DMA engines that are programmed by software instructions on one of the CPUs in the device. It is very difficult to debug problems associated with incorrect programming of these peripherals or hardware engines, since the data transactions may impact the operation of CPUs other than the one that was responsible for programming the device, or have system-level consequences that are not visible from the perspective of the debug tools attached to any of the CPUs.
Consequences include both correctness issues (where the operation of the device is incorrect because of the problem) and performance issues (where the real-time behavior of the device is impacted in a way that prevents it from completing its tasks in a timely manner). Multicore performance issues in particular require real-time debugging techniques that do not involve halting any of the CPUs (as is typically done when a breakpoint is hit, for example).
A specific example of the type of problem that is particularly hard to debug is when data used for interprocessor communication is transferred by a DMA engine and the data arrives after a real-time deadline requires it to arrive. Determining why the data arrived late requires insight into the real-time behavior of the software that programmed the transaction, the other transactions handled by the DMA engine, bus contention, cache behavior, and the operation of software on the various CPUs.