Read Input/Output
Data storage systems manage massive amounts of data. Storage resources of the data storage system store the data, and a server coupled to the storage resources processes access requests (e.g., read and/or write requests) to the data. Data storage systems typically serve data access requests for many clients, including human users, remote computing systems, internal applications, or other sources of access requests. An operating system including a filesystem processes and services the access requests and provides access to the data. Data storage systems typically implement some form of caching to improve efficiency and throughput in the system. The operating system and its file system guarantee the validity of the data.
As described herein, a server of a storage system includes a data access manager that accesses data with a physical location identifier instead of a logical block reference identifier. The server includes an operating system with a filesystem that manages data access, including caching the data, and referencing and serving the data from cache. The filesystem uses the logical block reference identifier to manage access to the cached data via one or more levels of indirection. The logical block reference identifier can alternatively be referred to as an indirection identifier, and represents the data indirectly; the logical block reference identifier must be mapped to a physical location identifier to access the data. The data access manager can obtain a physical location identifier (e.g., by obtaining and resolving an indirection identifier) that directly references a physical memory location of the data.
The filesystem maintains a pool of buffers, including management of availability of the buffers (e.g., allocation and deallocation of resources for the buffers). The pool of buffers is a group of logical data units of memory used to cache data. The buffers can be managed or maintained through an index or hash created to identify the logical data unit. It will be understood that a buffer is a representation of a physical resources (e.g., storage or memory resources), such as a location in a cache device. The cache can be represented logically as a “hash” representation, which allows logical operations on data access requests prior to committing the requests to the physical resources. From the perspective of locating data and performing checks on the data, typically such operations are performed by an access layer as logical operations on the hash or logical representation of the data. Ultimately, the data is stored in, and accessed from, physical locations.
The buffers can be provisioned by initializing the buffer and generating an identifier or hash value for the buffer and allocating resources to manage the buffer. The filesystem typically provisions buffers for use in a buffer cache, which is a caching device that buffers data access requests between the operating system and disk storage. As described herein, the data access manager can provision buffers to a cache location separate from the buffer cache. The separate cache location can be a cache location that is logically separated from the buffer cache in that the same physical device can store the data, and the data access manager provisions and maintains it. Thus, the memory resources are not available for the filesystem to provision for the buffer cache. In some cases, the data access manager can be considered independent or separate from the filesystem in that the data access manager can execute in parallel to the filesystem, and can access data in parallel to the filesystem without going through the filesystem to access the data.
The data access manager can be considered to bypass the filesystem by performing data access that is not managed by the filesystem and not part of the buffer cache managed by the filesystem. The buffer cache is a caching mechanism used by the filesystem to manage data access. When the data access manager bypasses the filesystem and the buffer cache, the data accessed does not have the guarantees of validity that are provided by the management of the filesystem. Thus, the data access manager provides validity checking of data obtained with a physical location identifier instead of a logical block reference identifier. If the validity check fails, the data access manager discards the data from its cache, which can be referred to as a private cache, in contrast to the buffer cache managed by the filesystem. When the validity test passes, the data access manager can provide access to the data by the requesting program.
The requesting program is an application or process, whether a system-level or user-level program, which makes a request for data. The expression “requesting program” or “requesting application” can refer to any standalone software application, as well as threads, processes, or subroutines of a standalone software application. The requesting program will frequently be a service or management entity within the storage system, and interfaces with the data and the operating system of the storage system on behalf of clients of the storage system. The clients can refer to remote programs or devices that are separate from the storage system and access the storage system over a network.
Operationally, a request from a client is forwarded as a packet over the network and onto the storage server where it is received at a network adapter. A network driver processes the packet and, if appropriate, passes it on to a network protocol and file access layer for additional processing prior to forwarding to file system. There, file system generates operations to load (retrieve) the requested data from the disks if it is not resident in the buffer cache. If the information is not in memory, file system accesses the inode file to retrieve a logical vbn and passes a message structure including the logical vbn to the RAID system. There, the logical vbn is mapped to a disk identifier and device block number (disk, dbn) and sent to an appropriate driver of disk driver system 890. The disk driver accesses the dbn from the specified disk and loads the requested data block(s) in memory for processing by the storage server. Upon completion of the request, the node (and operating system 800) returns a reply to the client over the network.
Background on Example Devices for Testing—Storage Filers
A file server is a computer that provides file service relating to the organization of information on storage devices, such as disks. The file server or filer includes a storage operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on the disks. Each “on-disk” file may be implemented as a set of data structures, e.g., disk blocks, configured to store information. A directory, on the other hand, may be implemented as a specially formatted file in which information about other files and directories are stored.
A filer may be further configured to operate according to a client/server model of information delivery to thereby allow many clients to access files stored on a server, e.g., the filer. In this model, the client may comprise an application, such as a database application, executing on a computer that “connects” to the filer over a direct connection or computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the file system on the filer by issuing file system protocol messages (in the form of packets) to the filer over the network. Each client may request the services of the file system by issuing file system protocol messages (in the form of packets) to the storage system over the network. By supporting a plurality of file system protocols, such as the conventional Common Internet File System (CIFS) and the Network File System (NFS) protocols, the utility of the storage system is enhanced.
Storage Operating System
As used herein, the term “storage operating system” generally refers to the computer-executable code operable on a computer that manages data access and may, in the case of a filer, implement file system semantics, such as a Write Anywhere File Layout (WAFL™) file system. The storage operating system can also be implemented as an application program operating over a general-purpose operating system, such as UNIX® or Windows NT®, or as a general-purpose operating system with configurable functionality, which is configured for storage applications as described herein.
The storage operating system of the storage system may implement a high-level module, such as a file system, to logically organize the information stored on the disks as a hierarchical structure of directories, files and blocks. For example, each “on-disk” file may be implemented as set of data structures, i.e., disk blocks, configured to store information, such as the actual data for the file. These data blocks are organized within a volume block number (vbn) space that is maintained by the file system. The file system may also assign each data block in the file a corresponding “file offset” or file block number (fbn). The file system typically assigns sequences of fbns on a per-file basis, whereas vbns are assigned over a larger volume address space. The file system organizes the data blocks within the vbn space as a “logical volume”; each logical volume may be, although is not necessarily, associated with its own file system. The file system typically consists of a contiguous range of vbns from zero to n, for a file system of size n−1 blocks.
A common type of file system is a “write in-place” file system, an example of which is the conventional Berkeley fast file system. By “file system” it is meant generally a structuring of data and metadata on a storage device, such as disks, which permits reading/writing of data on those disks. In a write in-place file system, the locations of the data structures, such as inodes and data blocks, on disk are typically fixed. An inode is a data structure used to store information, such as metadata, about a file, whereas the data blocks are structures used to store the actual data for the file. The information contained in an inode may include, e.g., ownership of the file, access permission for the file, size of the file, file type and references to locations on disk of the data blocks for the file. The references to the locations of the file data are provided by pointers in the inode, which may further reference indirect blocks that, in turn, reference the data blocks, depending upon the quantity of data in the file. Changes to the inodes and data blocks are made “in-place” in accordance with the write in-place file system. If an update to a file extends the quantity of data for the file, an additional data block is allocated and the appropriate inode is updated to reference that data block.
Another type of file system is a write-anywhere file system that does not overwrite data on disks. If a data block on disk is retrieved (read) from disk into memory and “dirtied” with new data, the data block is stored (written) to a new location on disk to thereby optimize write performance. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks.
Physical Disk Storage
Disk storage is typically implemented as one or more storage “volumes” that comprise physical storage disks, defining an overall logical arrangement of storage space. Currently available filer implementations can serve a large number of discrete volumes (150 or more, for example). Each volume is associated with its own file system and, for purposes hereof, volume and file system shall generally be used synonymously. The disks within a volume are typically organized as one or more groups of Redundant Array of Independent (or Inexpensive) Disks (RAID). RAID implementations enhance the reliability/integrity of data storage through the redundant writing of data “stripes” across a given number of physical disks in the RAID group, and the appropriate caching of parity information with respect to the striped data. In the example of a WAFL™ file system, a RAID 4 implementation is advantageously employed. This implementation specifically entails the striping of data across a group of disks, and separate parity caching within a selected disk of the RAID group. As described herein, a volume typically comprises at least one data disk and one associated parity disk (or possibly data/parity partitions in a single disk) arranged according to a RAID 4, or equivalent high-reliability, implementation.
In the drawings, the same reference numbers and any acronyms identify elements or acts with the same or similar structure or functionality for ease of understanding and convenience. To easily identify the discussion of any particular element or act, the most significant digit or digits in a reference number refer to the Figure number in which that element is first introduced.