Computing systems are generally composed of, among other things, integrated circuits (ICs) and a variety of other electronic components such as, but not limited to state machines, application specific integrated circuits (ASICs), logic gates and discrete logic devices. Like most electronic devices, these components are susceptible to electrostatic discharges and other events which act to transfer (i.e., discharge) electric charge due to an electrostatic and/or electromagnetic event (collectively, discharging events). If strong enough, such discharging events act to place the component into an incorrect, bad or error state, thereby rendering the element at least temporarily non-operational. In other words a fault has occurred. While some system designers have provided devices for protecting or shielding system components from discharging events, it is not uncommon for manufacturers to either not use an adequate shield or to employ ineffective protection mechanisms to ensure against such events. In some instances, the manufacturing design decision may be dictated by the functionality of the device, the size or physical characteristics of the device or simple economics. In any event, computing systems are and will continue to be susceptible to a variety of discharging events that result in a fault (i.e., a non-operational condition).
Mobile devices are particular susceptible to discharging events and possible faults due to the nature of their use. For instance, a mobile device is, by definition, small and portable. Users are enabled with the freedom to travel great distances and traverse a variety of environments with the device in hand. Consequently, movement by a user may generate a build up of charge on the user's body or clothing. In touching or coming close to the mobile device, the user may act as a conduit thereby transferring the charge to the mobile device and likely rendering it non-operational. It is further recognized, however, that non-mobile computer systems such as desktop computers, set top boxes or other computing systems may also be susceptible to discharging events and possible faults in similar situations where charge is transferred to such systems by an operator.
As is generally appreciated by one of ordinary skill in the art, a computing system such as a mobile telephone or other handheld device may include two processing units, a central processing unit (CPU) and a graphics processing unit (GPU). The CPU is coupled to the GPU via a north bridge, a south bridge, any suitable bus or buses or any combination thereof to pass drawing commands and other operation commands or instructions for subsequent execution. The GPU may be associated with a plurality of registers, a frame buffer and a graphics processor. The CPU may similarly be associated with a plurality of individual components and is coupled to system memory for storage of, among other things, executable instructions and operational data. In one embodiment, a variety of drivers and other software modules may be stored in system memory for execution on the CPU.
The CPU of most mobile devices is generally shielded to protect it and its related electronic components from susceptibility to discharging events. However, due to a variety of manufacturing and engineering-related design decisions, it is common for a GPU of a mobile device to be inadequately protected. Consequently, upon a discharging event, it has been discovered that the GPU and at least one or more of its related electronic components is placed in a bad or non-operational state. For instance, it is recognized that GPU registers are particularly susceptible to a faults and need to be rebooted for subsequent operation.
While prior art solutions exist to detect a fault condition (i.e., any condition indicating a fault) associated with a GPU and to restore the GPU to a workable state, no known solution exists where restoration of the processing unit returns it to a known, workable state for seamless or near seamless operation. For instance, it is known to detect a discharging event and subsequent fault condition by monitoring certain registers of the GPU using a driver executed by the CPU. The known prior art generally operates by rebooting both the GPU and GPU driver effected by the fault. However, by rebooting the GPU, user context information obtained during normal operation of the GPU driver is lost. Thus, the operating system of the CPU and other clients/applications issuing commands for execution by one of the CPU and the GPU need to generate new user context information before execution. Generally, this requires a user to initiate another instance of the software modules/drivers that were previously running before detection of the fault condition. Both those of ordinary skill in the art and common users of computing devices understand this results in lost data and user dissatisfaction with their computing device.
Therefore, a need exists for a fault detection and restoration method and apparatus for use in a computing system where drivers, clients and other applications running on the co-processing unit are not affected by the fault condition associated with the processing unit. A further need exists for restoring the computing system such that the affected portion thereof is returned to a known, useable state. Accordingly, neither the operating system or clients/applications utilizing the processing unit would be affected by the discharging event. Instead, they would remain operational with minimal impact on the user's experience with the computing system. As explained, such a method and apparatus would provide a near-seamless method for recovery after the detection of a fault condition.