A file server is a computer that provides file service relating to the organization of information on storage devices, such as disks. The file server or filer includes a storage operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on the disks. Each “on-disk” file may be implemented as a set of disk blocks configured to store information, such as text, whereas the directory may be implemented as a specially-formatted file in which information about other files and directories are stored. A filer may be configured to operate according to a client/server model of information delivery to thereby allow many clients to access files stored on a server, e.g., the filer. In this model, the client may comprise an application, such as a file system protocol, executing on a computer that “connects” to the filer over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the filer by issuing file system protocol messages (in the form of packets) to the filer over the network.
A common type of file system is a “write in-place” file system, an example of which is the conventional Berkeley fast file system. In a write in-place file system, the locations of the data structures, such as inodes and data blocks, on disk are typically fixed. An inode is a data structure used to store information, such as meta-data, about a file, whereas the data blocks are structures used to store the actual data for the file. The information contained in an inode may include, e.g., ownership of the file, access permission for the file, size of the file, file type and references to locations on disk of the data blocks for the file. The references to the locations of the file data are provided by pointers, which may further reference indirect blocks that, in turn, reference the data blocks, depending upon the quantity of data in the file. Changes to the inodes and data blocks are made “in-place” in accordance with the write in-place file system. If an update to a file is extends the quantity of data for the file, an additional data block is allocated and the appropriate inode is updated to reference that data block.
Another type of file system is a write-anywhere file system that does not over-write data on disks. If a data block on disk is retrieved (read) from disk into memory and “dirtied” with new data, the data block is stored (written) to a new location on disk to thereby optimize write performance. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks. A particular example of a write-anywhere file system that is configured to operate on a filer is the Write Anywhere File Layout (WAFL™) file system available from Network Appliance, Inc. of Sunnyvale, Calif. The WAFL file system is implemented within a microkernel as part of the overall protocol stack of the filer and associated disk storage. This microkernel is supplied as part of Network Appliance's Data ONTAP™ storage operating system, residing on the filer, that processes file-service requests from network-attached clients.
As used herein, the term “storage operating system” generally refers to the computer-executable code operable on a storage system that implements file system semantics and manages data access. In this sense, Data ONTAP software is an example of such a storage operating system implemented as a microkernel. The storage operating system can also be implemented as an application program operating over a general-purpose operating system, such as UNIX® or Windows NT®, or as a general-purpose operating system with configurable functionality, which is configured for storage applications as described herein.
Disk storage is typically implemented as one or more storage “volumes” that comprise physical storage disks, defining an overall logical arrangement of storage space. Currently available filer implementations can serve a large number of discrete volumes (150 or more, for example). Each volume is associated with its own file system and, for purposes hereof, volume and file system shall generally be used synonymously. The disks within a volume are typically organized as one or more groups of Redundant Array of Independent (or Inexpensive) Disks (RAID). RAID implementations enhance the reliability/integrity of data storage through the redundant writing of data “stripes” across a given number of physical disks in the RAID group, and the appropriate caching of parity information with respect to the striped data. In the example of a WAFL file system, a RAID 4 implementation is advantageously employed. This implementation specifically entails the striping of data across a group of disks, and separate parity caching within a selected disk of the RAID group. As described herein, a volume typically comprises at least one data disk and one associated parity disk (or possibly data/parity) partitions in a single disk) arranged according to a RAID 4, or equivalent high-reliability, implementation.
Internally, the filer is a microprocessor-based computer in which one more microprocessors are interconnected by a system bus to various system components that may be physically located on a motherboard and which include a memory, having a buffer cache for storing data and commands, a network adapter for communicating over the LAN or another network, a firmware storage device such as an erasable programmable read only memory (EPROM—which may comprise a flash memory, that retains power during shutdown), that contains system firmware (including a boot mechanism), and various storage adapters for communicating with the storage volumes of the disk array attached to the filer.
In particular, the system firmware provides the basic initial inputs to the microprocessor so as to boot the computer. This process shall be broadly termed a “boot mechanism.” At power-on, when boot-up occurs, the boot mechanism, stored in the firmware, is responsible for initializing memory, establishing various hardware connections, and performing certain power-on self-tests (POSTs). The boot mechanism, if all is functioning properly, then enables initial access to the stored copy of the storage operating system kernel so that it may be loaded into the filer memory. When appropriate, the storage operating system comes on-line and takes over filer operations. Upon shutdown, the boot mechanism is responsible for taking over from the storage operating system as the shutdown operation occurs. The boot mechanism provides the final steps before a restart (“boot-up”) of the filer.
From time to time, instead of a normal boot-up, a diagnostics boot is executed in the filer, either as routine maintenance check to verify normal operation of the hardware components in the system, or alternatively, to diagnose and correct problems that may arise during operation. In addition to troubleshooting problems, it may be, for example, that a new network adapter or storage adapter card is being added in a scalable system to accommodate additional clients. Alternatively, additional disks or volumes may be added for an expansion of the system. Each of these new devices or functions needs to be configured and checked when it is brought on-line. Other new hardware components may also be added to replace faulty components or to upgrade the system. These new components need to be configured, checked and synchronized with the preexisting system. Diagnostics are employed to perform various checks in connection with these exemplary tasks.
According to one conventional approach, the diagnostics code for a diagnostics boot is contained on a floppy disk or CD ROM which is inserted into the computer by a maintenance operator at boot-up. In this manner, the diagnostics program is run and the results are observed in real-time. One drawback to such an approach is that the filer may be part of a distributed network in which the subject filer is remote (possibly in another city) with respect to the operator's local site.
As an alternative, the diagnostics routine has been placed directly on the on-motherboard EPROM (or onboard flash) that contains the firmware boot mechanism. However, there are several drawbacks to this approach. First, a conventional on-motherboard firmware EPROM may be limited in storage size. In one example, a typical Original Equipment Manufacturer (OEM)-supplied onboard flash for storage of firmware is only about 0.5 Mbytes in size. This limits the amount of information with respect to diagnostics that can be stored.
In addition, the placement of a diagnostics routine on the firmware that also contains the boot mechanism can present risks. It is often desirable to update diagnostic routines. However, commingling the diagnostics routine and boot mechanism on the same reprogrammable medium may increase risk of corruption of the boot mechanism during an attempt to update the diagnostics. More specifically, the EPROM provided from the manufacturer of the motherboard often includes memory that is already segmented, and if one were to attempt to add code or to rewrite code, then a whole sector of the memory may have to be erased which could compromise other aspects of the programming. While a partitioning of the firmware EPROM could alleviate some risks associated with commingling the boot mechanism with the diagnostics, the size and configuration of a conventional on-motherboard EPROM make this impracticable.
Moreover, during diagnostic sessions, the results produced in the tests being run (e.g., a diagnostics log) may be available to the operator in real-time, but they are often not saved. As such, valuable “error code” information that may have been displayed is often lost. It would be advantageous to maintain a record of diagnostics log data, configuration information, component operating characteristics, and the like, during and after diagnostic procedures. Again, the size of the EPROM dictates that such a log typically (if at all kept) resides in disk storage. Thus, the accessibility of the log could be compromised in the event of a disk failure or other circumstance.
Finally, the presence of diagnostics in conjunction with the firmware of the boot mechanism means that upgrades or changes to the underlying diagnostics must occur generally at boot-up. There is, again, significant risk in attempting to write to the media that stores the boot mechanism during runtime. The need to rely on a reboot to effect change to the diagnostic code thereby (or read a diagnostic log) causes further delays in the start of normal file service and interrupts its continuity.
Accordingly, it is an object of this invention to provide an alternate storage location for the diagnostics code and an associated log of diagnostics information that can be accessed readily during runtime and does not jeopardize, or interfere with, the integrity of the boot mechanism or other basic filer on-board functions. This mechanism should enable a relatively large capacity routine to be stored and a relatively large capacity log to be maintained. This log should be accessible readily during runtime and the diagnostics should be upgradable by a variety of techniques at convenient times that do not unduly interrupt file service.