A file server is a computer that provides file service relating to the organization of information on storage devices, such as disks. The file server or filer may be embodied on a storage system including a storage operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on the disks. Each “on-disk” file may be implemented as a set of disk blocks configured to store information, such as text, whereas the directory may be implemented as a specially-formatted file in which information about other files and directories are stored.
A filer may be configured to operate according to a client/server model of information delivery to thereby allow many clients to access files stored on a server, e.g., the filer. In this model, the client may comprise an application, such as a file system protocol, executing on a computer that “connects” to the filer over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the filer by issuing file system protocol messages, usually in the form of packets, to the filer over the network.
As used herein, the term storage operating system generally refers to the computer-executable code operable on a storage system that manages data access and client access requests and may implement file system semantics in implementations involving filers. In this sense, the Data ONTAP™ storage operating system, available from Network Appliance, Inc. of Sunnyvale, Calif., which implements a Write Anywhere File Layout (WAFL™) file system, is an example of such a storage operating system implemented as a microkernel within an overall protocol stack and associated disk storage. The storage operating system can also be implemented as an application program operating over a general-purpose operating system, such as UNIX® or Windows NT®, or as a general-purpose operating system with configurable functionality, which is configured for storage applications as described herein.
The disk storage is typically implemented as one or more storage volumes that comprise physical storage disks, defining an overall logical arrangement of storage space. Currently available filer implementations can serve a large number of discrete volumes (150 or more, for example). Each volume is associated with its own file system and, for purposes hereof, volume and file system shall generally be used synonymously. The disks within a volume are typically organized as one or more groups of Redundant Array of Independent Disks (RAID). RAID implementations enhance the reliability/integrity of data storage through the redundant writing of data “stripes” across a given number of physical disks in the RAID group, and the appropriate storing of parity information with respect to the striped data. As described herein, a volume typically comprises at least one data disk and one associated parity disk (or possibly data/parity partitions in a single disk) arranged according to a RAID 4, or equivalent high-reliability, implementation.
Packets of information received by a filer from a network interface are typically stored in a memory buffer data structure or mbuf in the memory of a filer. Mbufs are used to organize received information into a standardized format that can be manipulated by various layers of a network protocol stack within a storage operating system. The information stored in a mbuf can include a variety of different data types including, inter alia, source and destination addresses, socket options, user data and file access requests. Further, mbufs can be used as elements of larger data structures, e.g. linked lists, and are particularly useful in dynamically changing structures since they can be created or removed “on the fly.” A description of mbuf data structures is provided in TCP/IP Illustrated, Volume 2 by Wright et al (1995) which is incorporated herein by reference.
Information is often received from a network as data packets of various lengths and these packets are stored in variable length chains of mbufs. In contrast, file systems usually operate on data arranged in blocks of a predetermined size. For instance, data in the WAFL file system is stored in contiguous 4 kilobyte (kB) blocks. Therefore, data received by a filer is converted from variable length mbufs to the fixed sized blocks for use by the file system. The process of converting data stored in mbufs to fixed sized blocks may involve copying the contents of the mbufs into the filer's memory, then having the file system reorganize the data into blocks of a predetermined size.
File systems typically associate a buffer header with each fixed sized data block. Information in a buffer header may include a pointer for locating the data block at a particular location in memory, a block number for identifying the data block from among other blocks at that memory location, a file name associated with data in the data block, and so forth. Because they are generally much smaller in size than their associated data blocks, buffer headers are often “passed” between layers of the storage operating system instead of their larger data blocks. That is, the operating system layers (e.g., network protocol stack, file system and disk access layers) operate only on the contents of the buffer headers to resolve file access requests. Therefore, once data received by a filer is copied from mbufs into memory and partitioned into fixed block sizes, buffer headers for the fixed sized data blocks can be sent to a RAID layer and a disk device driver layer of the storage operating system in accordance with a resolved file access request.
The process of converting data from variable length mbuf data structures to fixed sized blocks consumes system resources, such as memory and central processing unit (CPU) cycles, that could be used for other operations executed by the filer. Furthermore, the latency resulting from this conversion becomes particularly noticeable when a large number of mbuf data structures are converted to fixed sized data blocks. For example, when a filer receives a request to store (via a “WRITE” operation) a large file to disk, its file system must allocate a sufficient amount of memory for mbufs to receive the in-coming file and, in addition, must allocate more memory to copy the contents of the mbufs when the received file is divided into fixed block sizes. Not only does such a WRITE operation consume a lot of memory, but it also requires the filer's CPU to implement instructions for moving and partitioning the data file thereby consuming CPU cache and bandwidth that could be used by other processes.
Therefore, it is generally desirable to decrease the latency of processing in-coming data to a filer by decreasing the number of times mbufs are copied and partitioned in the filer's memory. More specifically, it is desirable to minimize the amount of time and system resources needed to write large data files to one or more storage disks in a filer without affecting the resolution of other file access requests, such as file “READ” requests.