Modern mobile communication devices are often built around a multi-subsystem system-on-chip (or “SOC”), with each subsystem performing a specific task, such as audio, video, peripheral interfacing, modem, communication, global positioning system (or “GPS”), etc. Each of the subsystems may be designed to service specialized hardware elements for accelerated processing and may communicate over a high performance inter-processor communication bus to perform various tasks. For example, the subsystems may communicate to accomplish tasks such as voice calls, video streaming, audio playback, etc. While performing such tasks, one or more subsystems in the system-on-chip may be active at a given time.
Reliability of the subsystems in a multi-subsystem system-on-chip may be evaluated based on the Mean Time Between Failure (or “MTBF”) metric, which is defined as the arithmetic mean time between system failures. For example, over a period of time, a calculated Mean Time Between Failure metric may describe the average time in between crashes of a particular subsystem. In general, a subsystem may experience more reliable performance the longer the time between system failures. In other words, the higher the Mean Time Between Failure, the better the subsystem performance. When a subsystem starts experiencing failures, the subsystem performance may degrade, followed by a loss of service until the subsystem is restarted. Such failures are highly undesirable from a user's point of view.
The age (or the elapsed time since activation) of active software may impact subsystem reliability, and the Mean Time Between Failure may be inversely proportional to the age of the software executing on (or associated with) a subsystem. Software reliability and/or performance degradation over time, or “software aging,” may account for many common subsystem failures. Such software aging may be the result of adverse software operating conditions, including memory fragmentation, memory leaks, overflow or underflow of counters, data corruption, and poor garbage collection. For example, in non-optimized subsystem software, memory may be allocated but not released after use, causing a cumulative lack of available memory for various subsystem operations. Further, many subsystem failures may occur when subsystem software is not restarted after a period of consistent use. This is a particular problem for mobile devices (e.g., smartphones, tablets, laptops, etc.), as these devices typically are rarely restarted. For example, many mobile devices may only be rebooted when users travel in commercial airplanes and/or during firmware upgrades.