The ZetaByte File System (ZFS) uses a logging mechanism, the ZFS intent log (ZIL) to store synchronous writes, until they're safely written to the main data structure in the memory storage pool. The speed at which data can be written to the ZIL determines the speed at which synchronous write requests can be serviced: the faster the ZIL, the faster most databases, NFS and other important write operations become. Normally, the ZIL is part of the regular memory pool on storage disk. But ZFS offers the possibility to use a dedicated device for the ZIL. This is then called a “log device”.
ZFS also has a sophisticated cache called the “Adaptive Replacement Cache” (ARC) where it stores both most frequently used blocks of data and most recently used ones. The ARC is stored in RAM, so each block of data that is found in the RAM can be delivered quickly to the application, instead of having to fetch it again from disk. When RAM is full, data needs to be thrown out of the cache and is not available any more to accelerate reads.
Solid-state arrays (SSA) have moved the external controller-based storage array market from a relatively stagnant incrementally improving market with slow-changing dynamics to a progressive neoclassical market. Improvements in the dynamics of many factors—such as reduced storage administration, power, cooling, rack space, increased performance and density—have changed the accepted assumptions of the previous SAN storage array market. Many vendors design and develop their own custom solid-state solutions. Consequently, more vendors are offering alternate solid-state media form factors with denser and faster systems when they create their own NAND flash storage packaging. From a whole system perspective, the largest SSAs now scale to 3.9 PB, and next-generation SSD technology and interconnects will again redefine performance capabilities, creating demand for faster storage networks.
Neither the solid-state array, nor the storage array administrator is the bottleneck anymore; but network latency has become the challenge. This has extended the requirement and life span for 16 Gbps and 32 Gbps Fibre Channel SANs, as Ethernet-based networks and related storage protocols struggle to keep up. Many new vendors have entered the market who provide comprehensive service management, and along with many traditional storage vendors, they continue to transition their portfolios from HDD-based arrays to all solid-state arrays.
Therefore, an SSA that is two to three times more expensive to purchase becomes a cost-effective replacement for a hybrid or general-purpose array at increased utilization rates. With regard to performance, one SSD can typically replace multiple HDDs, combined with data reduction features and increased storage administrator productivity the price point at which SSA investment decisions are made is dropping rapidly. Redundant array of independent disks (RAID) rebuild times for high-capacity SSDs are also faster than for high-capacity HDDs. Therefore, as HDD storage capacities increase, so do HDD recovery times, and SSAs reduce the risk exposure during any media failure and recovery window. Use cases for SSAs are moving into analytics, file and object workloads, and some customers even use SSAs as backup targets to reduce backup and restore windows.
Price and ownership programs translate into very competitive purchase prices for buyers, but vendors are faced with challenges to becoming profitable as incumbent vendors discount to avoid losing market share and new vendors discount to attract new customers. Because the SSA market has expanded rapidly with SSD reliability being equal to or better than HDD arrays, and feature parity also equalizing, the competitive battle to differentiate has moved to ease of ownership, and remote and pre-emptive support capabilities.
In contrast to block and file I/O storage systems, when an object is stored in Object addressable data storage systems (OAS), the object is given a name that uniquely identifies it and that also specifies its storage location. This type of data access therefore may eliminate the need for a table index in a metadata store and it may not be necessary to track the location of data in the metadata. An OAS receives and processes access requests via an object identifier that identifies a data unit or other content unit rather than an address that specifies where the data unit is physically or logically stored in the storage system.
In OAS, a content unit may be identified using its object identifier and the object identifier may be independent of both the physical and logical locations where the content unit is stored. In other words, the object identifier does not control where the content unit is logically or physically stored. Thus, if a physical or logical location of a content unit changes, the identifier for access to the unit of content may remain the same. Thus, an application program may simply track the name and/or location of a file rather than tracking the block addresses of each of the blocks on disk that store the content.
Many storage systems have separate systems to de-duplicate and compress data and replication software is often added post system build. Server vendors have used available building blocks to slash server prices dramatically, yet storage incumbents continue to overcharge customers for their storage servers. Architectural complexity, non-integrated products, expensive proprietary networking protocols, cumbersome administration and licensing for every module of software are the norm and burden storage consumers with high prices and high maintenance.
Modern computing ecosystems rely on resilient data availability for most of their functions. This translates directly into failure-resilient storage systems, which have fostered the development of strategies in storage server solutions like clustering (multiple computers per file system), shared storage, and splitting of compute and file-system responsibilities. Simultaneously, the network file-system protocols like CIFS (Common Internet File System) and NFS (Network File System) have undergone modifications that allow applications running on remote clients to receive a seamless flow of data, irrespective of storage node failures at the server. This is primarily achieved by making the storage server cluster guarantee that once a client acquires a handle on a file by opening it, the cluster and not a specific node will honor client interactions for this file-handle. This guarantee has major implications to the manner in which client's file-handle data must be stored on a storage server.
In a traditional storage server, the storage host node, which services the client's request for opening a file, creates an in-memory context for the client's request to open the file and refers to it for all further client operations using the file-handle that it returns to the client as a part of an open response till the client relinquishes the file-handle, typically through a file-close.
This in-memory context, or client's file-handle info, can be grouped into the following categories. Mode of usage: The manner in which the client wishes to use the file, e.g. read, write, execute, delete etc. Mode of shared usage: The manner in which the client allows other clients to use this file concurrently. Locking information: The extent of the file over which the client wishes exclusive access. This state may also contain information about any soft-lock or opportunistic lock that the client holds for caching read and writes to the file locally. Any application specific context that the client wishes to save as opaque metadata for the lifetime of the file-handle.
For a failure-resilient storage system, this in-memory state, referred to as ‘persistent-handle-info’ or PHDL-info hereafter, must be made available to other nodes of the system, so that in the event of primary node failure, any other node in the storage cluster can serve the data to clients once the latter present their persistent-handles for reconnection. However, storing the persistent-handle-info for long time-intervals can cause considerable memory consumption on the storage server.
Throughout the description, similar reference numbers may be used to identify similar elements in the several embodiments and drawings. Although specific embodiments of the invention have been illustrated, the invention is not to be limited to the specific forms or arrangements of parts so described and illustrated. The scope of the invention is to be defined by the claims herein and their equivalents.