As the number of computing devices increase across society, electronic data management has become increasingly challenging. Modern devices create and use ever increasing amounts of electronic data ranging from digital photos and videos, to large data sets related to any number of topics including energy exploration, human resources, seismic activity, and gene research. This explosion in digital data has naturally led to ever increasingly large amounts of data that must be stored. Correspondingly, the data storage field is under constant pressure to increase size, performance, accessibility, reliability, security, and efficiency of data storage systems.
In order to meet these demands for data storage, various storage systems have been developed. Large scale storage systems often include storage appliances that include arrays of spinning hard drives, magnetic tape drives, and/or solid state drives. Multiple storage appliances may be networked together to form a cluster. A cluster of storage appliances provides for added capacity as well as added redundancy, as compared to a single appliance. Storage appliances in a cluster may be configured to mirror data so that if one of the storage appliances becomes inoperable for any reason, the data is still available at another location.
Referring to FIG. 1, a storage network 100 is depicted. This storage network includes one or more storage appliances 110, 120 each including one or more disk drives. As mentioned, the appliances may be clustered. The storage network 100 is accessible by clients 130, 132, 134, 136 using a network 140. Generally speaking, the storage appliance (or appliances) manages the storage of data on disk drives. The depicted networks may be local in nature or geographically dispersed such as with large private enterprise networks or the Internet.
The storage appliances 110, 120 may include any conventional storage appliance such as a ZFS storage appliance. ZFS is a combined file system and volume manager designed by Sun Microsystems® in 2005 that allows for data integrity verification and repair, high storage capacities, along with numerous other features. ZFS based systems utilize storage pools (often referred to as zpools) constructed of virtual devices (often referred to as vdevs) constructed of block devices. A block device is any device that moves data in the form of blocks including hard disk drives and flash drives. A virtual device may span a number of block devices and a zpool may include one or more vdevs, each including one or more partitions of hard drives or one or more hard drives.
Traffic to and from the storage appliances 110, 120 is typically managed by the one or more dedicated storage servers located within the appliances. A common protocol used for managing these storage appliances 110, 120 is the network file system, commonly abbreviated “NFS.” NFS is a widely used distributed file system protocol, originally developed by Sun Microsystems in 1984, and currently in version 4 (NFSv4). NFS allows users at the clients 130-136 to access the stored data seamlessly by providing a programming interface found on the storage appliances 110, 120. The programming interface enables the creation and deletion of files, reading and writing of files, performing seeks within a file, creating and deleting directories, managing directory contents, and any other file operation. The operating system running on each of the clients 130-136 is configured to utilize the programming interface in order to manage the file system and to facilitate the interaction of executing applications with data residing in the storage appliances 110, 120.
In this example, the storage appliances 110, 120 are configured to operate using NFSv4. Generally, NFS systems are configured to separate the storage of file-system metadata and the files themselves. The metadata describes the location of the files on the storage appliances' disk drives that the clients 130-136 are attempting to access. NFS is a “stateful” protocol meaning the storage appliances 110, 120 each maintain a log of current operations being performed by the clients 130-136. This log is often referred to as “state table.”
Each storage appliance 110, 120 is aware of the pools that are collectively being served by the storage appliances 110, 120. Each pool has a corresponding distributed stable storage (DSS) path where the storage server writes persistent data about each client 130-136 when the client first contacts the server. This data may be used to identify data owned by a client if the client becomes disconnected from the storage server or storage appliances 110, 120.
Any time that a computing device must perform input/output (I/O) operations, the speed of the computing device is slowed. Any calls to memory, whether the memory is cache, random access memory (RAM), or persistent storage such as a conventional spinning hard drive, are costly, in that they cause the computing device to waste of clock cycles as the system waits for the requested data to be pulled from memory. Depending on the type of memory, the cost of reading from the memory is more or less costly. For example, reading from cache memory is faster than reading from random access memory (RAM), which is faster than reading from persistent storage such as a traditional spinning hard drive.
Software applications often require files or application data in order to function at a basic level. Once a user starts the application, the application will start making I/O requests as it needs application data to operate. As the user causes the application to perform additional operations, the number of I/O operations needed for the application to function increases beyond the basic level of I/O. I/O operations are limited by the I/O bandwidth, often referred to as Input/Output per second (IOPS). I/O bandwidth is a limited resource, meaning that only so many IOPS may be performed. Thus, the fewer the I/O requests made for running an application at a basic level, the higher the I/O bandwidth that can be provided to the application.
Each I/O request made by an application may be placed in an I/O queue. The I/O queue operates fairly conventionally in a first-in-first-out manner. If a computing system is under a light load, the I/O queue is minimal and I/O requests spend little time in the I/O queue. As a computing system uses more I/O, the computing system will reach a maximum level of IOPS, at which the I/O queue will expand in length, increasing the time it takes to perform operations.
Not all read requests have the same priority. For example, sometimes an application will issue a read request when the application needs the data block being requested. In other cases, applications will issue read requests that are “prefetches” that are for data blocks being requested in anticipation of needing the data blocks at a later time. One issue with an application prefetching data is that it may lead to the computing system reading the data from memory twice as well as filling up the I/O queue with requests that may never need to be fulfilled. For example, an application may anticipate that it will need a data block so it issues a prefetch. Before the prefetch read is made, the application may immediately need the data block and so the application issues an additional read request. In this case, the two read request have been added to the I/O queue for the same data block and I/O bandwidth is wasted by reading the data twice, once for each request.
In a storage system that utilizes ZFS, data is broken into blocks simply referred to as “data blocks.” In the simplest terms, one logical data block corresponds to a specific number of bytes of physical disk space, for example 512B, 1 KB, 2 KB, or any other size. A data block is often the smallest unit of storage that can be used or allocated. Various files and information may span a portion of a data block, a complete data block, or across a plurality of data blocks. For example, a contiguous series of data blocks used for storing a specific type of information is referred to as an “extent” and a set of extents allocated for a specific object are referred to as a “segment.” In ZFS systems, data is stored in storage pools that may span more than one physical storage device. Despite these storage pools spanning more than one physical storage device, from the outside they simply look like one large storage device.
When retrieving data from persistent storage, multiple data requests may be made for the same data block. Traditionally, these requests may be combined only if the request is for data residing on the same physical device. Thus, for blocks and segments that span across multiple physical devices, two requests for the same data at the same time result in two reads from memory, needlessly wasting I/O bandwidth.
It is with these and other issues in mind that various aspects of the present disclosure were developed.