A file server is a computer that provides file service relating to the organization of information on storage devices, such as disks. The file server or filer may be embodied as a storage system including a storage operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on the disks. As used herein, the term “storage operating system” generally refers to the computer-executable code operable on a storage system that manages data access and client access requests and may implement file system semantics in implementations involving filers. In this sense, the Data ONTAP™ storage operating system, available from Network Appliance, Inc. of Sunnyvale, Calif., which implements a Write Anywhere File Layout (WAFL™) file system, is an example of such a storage operating system implemented as a microkernel within an overall protocol stack and associated disk storage. The storage operating system can also be implemented as an application program operating over a general-purpose operating system, such as UNIX® or Windows®, or as a general-purpose operating system with configurable functionality, which is configured for disk servicing applications as described herein.
A storage system's disk storage is typically implemented as one or more storage volumes that comprise physical storage disks, defining an overall logical arrangement of storage space. Available storage system implementations can serve a large number of discrete volumes (150 or more, for example). A storage volume is “loaded” in the storage system by copying the logical organization of the volume's files, data and directories into the storage system's memory. Once a volume has been loaded in memory, the volume may be “mounted” by one or more users, applications, devices, etc. permitted to access its contents and navigate its namespace.
A storage system may be configured to operate according to a client/server model of information delivery to thereby allow many clients to access files stored on a server, e.g., the storage system. In this model, the client may comprise an application, such as a file-system protocol, executing on a computer that “connects” to the storage system over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Communications between the storage system and its clients are typically embodied as packets sent over the computer network. Each client may request the services of the storage system by issuing file-system protocol messages formatted in accordance with a conventional file-system protocol, such as the Common Internet File System (CIFS) or Network File System (NFS) protocol.
Conventionally, the storage operating system monitors any errors of the storage devices and provides servicing of the storage devices using a set of servicing parameters. The servicing parameters may define error thresholds and recommendations for courses of action to be taken on storage devices exceeding any error thresholds. Conventionally, however, the set of servicing parameters are encoded as code instructions within the storage operating system where modification of the servicing parameters requires modification of the code instructions of the storage operating system. Therefore, to implement a modified set of servicing parameters, a new storage operating system version having modified code instructions would have to be developed by software programmers and then installed onto the storage system. As such, there is a need for a method for modifying disk servicing parameters of a storage operating system in a convenient and non-disruptive manner.
Also, conventionally, the storage operating system may be configured to send disk and storage system data over a computer network to an outside entity (e.g., the provider of the disks or storage system) for performance or error analysis. Conventionally, however, the disk or storage system data is sent to provide overall storage system performance analysis and is not designed to specifically analyze disk errors. Therefore, conventionally, disk and storage system data is collected and sent on a predetermined regular schedule (e.g., collected and sent every week) and does not contain relevant disk or storage system data collected at the time a disk error occurred. As such, the disk or storage system data typically does not contain data for properly analyzing disk errors since the received data may have been collected long after the disk errors occurred and are not relevant for disk error analysis.
Also, the collected disk and storage system data is typically sent in a file comprising a stream of unformatted data having data of particular disks and storage system data randomly interspersed throughout the file. As such, when analyzing problems of a particular disk, the entire file (or several files) is usually searched (often manually) for data pertaining to the particular disk. When a general disk trend or issue across multiple disk types or storage systems is to be analyzed, a specialized programming script needs to be created for searching specific disk or storage system data relevant to the trend or issue (e.g., a script created for searching all data relating to a specific disk error type). As such, it is difficult to analyze errors of a particular disk as well as general disk trends or issues. Therefore, there is a need for a method for sending more relevant disk and storage system data relating to disk errors and a more efficient and convenient method for analyzing such disk and storage system data.
Further, conventionally, when the storage operating system encounters multiple errors of a storage device during servicing (e.g., monitoring or testing) of the storage device, it does not determine whether the errors are localized to a particular physical area of the storage device and counts each error as a separate error. A storage device having multiple errors that reach a specified error threshold may be failed by the storage operating system (and later removed from the storage system) when, in fact, the storage device may have a single localized physical defect, whereby the multiple errors were confined to a small area. Since, conventionally, the storage operating system counts each error as a separate error without determining whether the errors are localized to one area, the storage operating system may unnecessarily and prematurely fail storage devices in the storage system and cause a higher rate of storage device failure than warranted. As such, there is a need for a method for servicing (e.g., monitoring or testing) storage devices that considers localization of errors.