As is known in the art, computer systems generally include a central processing unit (CPU), a memory subsystem, and a data storage subsystem. According to a network or enterprise model of the computer system, the data storage system associated with or in addition to a local computer system, may include a large number of independent storage devices or disks housed in a single enclosure or cabinet. This array of storage devices is typically connected to several computers over a network or via dedicated cabling. Such a model allows for the centralization of data that is to be shared among many users and also allows for a single point of maintenance for the storage functions associated with the many host processors.
The data storage system stores critical information for an enterprise that must be available for use substantially all of the time. If an error occurs on such a data storage system it must be fixed as soon as possible because such information is at the heart of the commercial operations of many major businesses. A recent economic survey from the University of Minnesota and known as Bush-Kugel study indicates a pattern that after just a few days (2 to 6) without access to their critical data many businesses are devastated. The survey showed that 25% of such businesses were immediately bankrupt after such a critical interruption and less than 7% remained in the marketplace after 5 years.
Recent innovations by EMC Corporation of Hopkinton, Mass. provide business continuity solutions that are at the heart of many enterprises data storage infrastructure. Nevertheless, the systems (including devices and software) being implemented are complex and vulnerable to errors that must be quickly serviced for the continuity to be maintained.
EMC has been using a technique for responding to errors as they occur by “calling home” to report the errors. The data storage system is equipped with a modem and a service processor (typically a laptop computer) for error response. Sensors that are built into its storage systems monitor things such as temperature, vibration, and tiny fluctuations in power, as well as unusual patterns in the way data is being stored and retrieved—over 1,000 diagnostics in all. Periodically (about every two hours), an EMC data storage system checks its own state of health. If an error is noted, a machine-implemented “call home” is made to customer service over a line dedicated for that purpose. Every day, thousands of such calls home for help reach EMC's customer service center in Hopkinton. About one-third of the calls from EMC's machines trigger the dispatch of a customer engineer to fix some problem, but clearly not all calls can be handled right away. Nor are all errors necessarily caught by the reporting system. At risk is the data storage system owner's data, but even when not at risk, if the owner is dissatisfied with how long it is taking to get the problem resolved then that reflects poorly on the company that sold the data storage system to the owner.
Companies that sell data storage systems are very concerned with protecting the customer's data and with the customer's satisfaction with the overall ownership experience because they would like to have a mutually satisfactory business relationship. But the volume of calls and errors in general and the overall complexity of problems make it extremely difficult to have quick resolutions. But rushing to fix every problem as it comes in stretches resources undesirably and is costly.
What is needed is a way to handle errors and service problems in a way that fixes the problem in a reasonably timely fashion while ensuring that the owner stays satisfied with the experience.