Personal computer systems are well known in the art. They have attained widespread use for providing computer power to many segments of today's modern society. Personal computers (PCs) may be defined as a desktop, floor standing, or portable microcomputer that includes a system unit having a central processing unit (CPU) and associated volatile and non-volatile memory, including random access memory (RAM) and basic input/output system read only memory (BIOS ROM), a system monitor, a keyboard, one or more flexible diskette drives, a CD-ROM drive, a fixed disk storage drive (also known as a “hard drive”), a pointing device such as a mouse, and an optional network interface adapter. One of the distinguishing characteristics of these systems is the use of a motherboard or system planar to electrically connect these components together. Examples of such personal computer systems are IBM's PC 300 series, Aptiva series, and Intellistation series.
The widespread use of PCs to various segments of society has resulted in a reliance on the PCs for work, e.g., telecommuting, news, stock market information and trading, banking, shopping, shipping, communication in the form of Voice Internet protocol (VoiceIP) and email, as well as other services. In fact, for many PCs represent an essential tool for their livelihood. Thus, it is desirable to minimize loss of productivity by increasing the reliability of and reducing the downtime of PCs.
Unfortunately, the proliferation of PCs has been accompanied with a proliferation of quality issues related to early life failures for the PCs. Many of these failures appear to be induced by various operational parameters including operation of the PCs in environments that are not within specified environmental conditions, faulty capacitors, failing hard disk drives (HDDs), and etc. Moreover, although technicians try to obtain as much information about the operational parameters leading up to the failures as possible, the information currently being gathered is not reliable. Determinations with regard to early life failures, for instance, must be diagnosed after the failure based upon the physical characteristics of the failed PC and customer renditions of occurrences leading up to the failure. Accurate information related to the operation may be undeterminable. For instance, a component may appear to have failed as a result of a failure to dissipate sufficient heat. The failure may arise from an excessively high ambient temperature, a high voltage outside the specified operating guidelines, a power surge, a lack of proper ventilation, low fan performance, a fan failure, a blockage in one or more ventilation paths, or some combination thereof.
In the case of catastrophic failures of PCs, forensic analyses may offer information with regard to the states of one or more components in the area of the failed component(s). However, forensic analyses are very expensive and may still fail to provide a clear indication of the specific problem that led to the failure of the PC. For example, a voltage supply may supply a lower than specified voltage for the fan as a result of an unusual fluctuation in the voltage supplied to the power supply, reducing airflow throughout a PC such as a desktop computer. In addition, a stack of papers placed on the desk, next to the desktop computer may block an exit for air and the combination of the blockage and the reduced airflow changes circulation patterns within the computer, resulting in a lack of airflow in the area of the failed component. The component and other local components are then unable to dissipate the amount of heat that operations produce, resulting in an uncontrolled heat buildup. And the most susceptible component fails in response to the resulting heat buildup. The resulting heat build up in components near the blocked vent, if detectable, may not sufficiently identify the blockage as the primary cause of the failure, especially when the technician does not have the opportunity to survey the office.
In fact, customers are demanding real-time debug of problems with personal computing devices, many of which are related to software conflicts rather than failed components, so technicians typically begin diagnostics over the phone without any physical review of the personal computing device. Technical support by telephone relies heavily on the analytical expertise of the technicians and the knowledge of the user about the problem. The technicians have no knowledge regarding the usage of the personal computing device so they may gather information from the user about the usage and/or request that the user perform a number of standard testing procedures that could identify the problem.
The customer may provide useful information with regard to the operating environment and conditions at the time and possibly for a period of time before the failure of the PC. And, in some cases, the customer may be able to repeat steps that lead to the failure. For instance, the personal computing device may provide indications of problems or errors such as a failed read or write to a hard drive or memory address, a failed thread, a processing error, etc. while the customer is using the personal computing device. The customer may then be able to relate events leading up the failure of the personal computing device, allowing for a little more accurate of a determination of the cause of the failure. However, when the errors occur in quick succession and/or the customer is not diligent or sufficiently descriptive with regard to recording the errors, the information may be incomplete and possibly incorrect.
There is, therefore, a need for a cost-effective system to capture information that accurately describes operational parameters (power on hours, temperature data, fan performance, etc.) even in the worst cases and store the operational parameters in a robust memory that may survive early life failures. There is an even greater need for such systems when they offer fast access to the robust memory to avoid slowing down the PC.