A well-known method for silicon debug is clock-stop and scan-dump (CSSD), in which an application (e.g., one that is known to cause failures) is run as if in normal functional mode. At a pre-determined point in time, or at a time when a specific event occurs, clocks are stopped, and once the chip is known to be quiescent, the flip-flops (“FFs”) in the design are configured into one or more shift-registers, called “scan-registers” (usually by asserting a “scan-enable” signal). A scan-clock, which may run at a user-defined frequency, is used next to shift out, or “dump” the contents of the FFs for analysis. This is generally performed for a single clock domain or a sub-domain with varying levels of determinism. For example in a graphics processing unit (“GPU”) or central processing unit (“CPU”), most of the FFs are in a single large clock domain. During debug, the clock is stopped at its root, e.g., at the output of the phase locked loop (“PLL”), before shifting out contents of the FFs. However, in the presence of multiple asynchronous clock domains, stopping one clock domain deterministically may not guarantee enough debug information and may not allow scan-based shifting of FFs in the domain of the clock that was stopped. In addition, stopping a single large clock domain at its root may cause voltage transients in the power-grid and cause one or more FFs in the design to lose their debug information content.