Debugging embedded firmware defects in a customer's data storage environment presents many challenges. A particularly difficult challenge arises when adequate information, such as event logs providing error traces, cannot be captured to characterize and determine a root cause of a failure. The lack of diagnostic information can result in costly engineering time and the need for specialized equipment to emulate and reproduce the fault condition in order to diagnose and correct the underlying problem.
In some cases, however, the problems may not be easily reproduced, even by the customer. When sufficient debug data cannot be captured or reproduced, the root cause cannot be determined directly. In this scenario, the technicians must often resort to trial and error in an attempt to fix the undiagnosed problem. This type of “hit and miss” troubleshooting can be costly in terms of engineering time and customer satisfaction.
This problem has conventionally been alleviated by reserving or adding sufficient storage space to each data storage device or expander in the topology to allow each recordation of a robust amount of event logging information. With enough event log storage, a sufficient amount of history can usually be recorded to capture the problematic events as they arise. This approach is expensive, however, because it requires reserving or adding a significant amount of data storage capacity to each node in the storage topology for event data logging. In addition, error causing events can still be missed when the event is question does not halt operation of the system, which can result in the event log wrapping over the log for the event that caused the problem.
The existing event logging protocol has each expander maintain its own event log. When a problem arises, a technician typically gathers all of the logs from all of the expanders in the topology. The logs are then parsed individually or merged into a consolidated log for easier parsing. Operating systems help implement this type of event logging by exposing manual configurations that can be set to instruct each expander where to store its respective event log, allowing for consolidated ongoing event log storage. In the Linux operating system, for example, there is a “syslog-ng” program that allows an administrator to designate event logging data storage locations. This manual configuration protocol allows a user to specify logging storage locations for all (or certain log levels/types) of the event logs in a central server, where the event logs are typically consolidated into a single system log or stored a separate event log files for each individual expander. The central server can be configured to conveniently handle a large number of data storage expanders in any given data storage topology.
The conventional approach to event logging has a drawback, however, because this type of event logging system requires manual configuration on a node-by-node that has to be set up and managed by a system administrator. This requires engineering time and specialized knowledge of a trained system administrator to implement and manage the event logging system. There is, therefore, a continuing need for methods and systems for improving the event logging for data storage topologies. More particularly, there is a need for avoiding the need for specialized, node-by-node administration to set up and manage the event logs for expanders in data storage topologies.