A storage server is a computer that provides storage services relating to the organization of data on writable, storage media, such as non-volatile memories and disks. A storage server may be configured to operate according to a client/server model of information delivery to enable many clients (e.g., applications) to access the data served by the system. A storage server can employ a storage architecture that serves the data with both random and streaming access patterns at either a file level, as in network attached storage (NAS) environments, or at the block level, as in a storage area network (SAN). Storage servers store data on various types of non-volatile storage media such as, for example, relatively high latency (i.e., longer access times) hard disk drive devices (HDDs) and relatively low latency (i.e., shorter access times) solid-state devices (SSDs) such as flash memory or DRAM.
A network storage system may be a monolithic, non-distributed storage server, or it may be distributed across two or more physical platforms. Furthermore, a network storage system can operate as one of multiple storage servers in a storage server cluster, to provide increased scalability. A client may execute an application, such as a database application, that “connects” to the storage system over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the data services of the storage system by issuing access requests (read/write requests) as file-based and block-based protocol messages (in the form of packets) to the system over the network.
Some network storage systems use hierarchical (tree-shaped) data structures to organize the data and metadata that they store. The data and metadata may be stored and managed in units called “blocks,” where each block is a node in the hierarchical data structure. The hierarchical data structure can have internal nodes that reference other nodes within the data structure and leaf nodes which do not reference other nodes. The metadata may include reference counts associated with the nodes/blocks. A reference count indicates the number of references (e.g., pointers) to a particular node.
A reference count is an important parameter to track in a storage system in which nodes can be shared by two or more entities, such as files, logical units, etc. Nodes may be shared as a result of, for example, creating snapshots or clones of a file or a file system. When a node is shared by two or more entities (e.g., files), there will generally be multiple references to that node, e.g., one reference for each file that includes the data associated with the node. The reference counts can be used for a variety of purposes, such as for determining when a node is no longer referenced and therefore can be deleted or for identifying inaccessible nodes.
Some storage systems store the reference counts in the hierarchical data structure along with the nodes to which the reference counts correspond; this approach is sometimes called “hierarchical reference counting.” However, the traditional implementations of reference counts using a hierarchical structure suffer from significant drawbacks that make them unsuitable for use in many scenarios. For example, upon modification of data after cloning, the ability of multiple objects to reference a particular node may be broken. As a result, the explicit references and reference counts in the lower levels of the reference counting hierarchy (e.g., to the modified node's children) have to be updated. Consequently, small data modifications can produce a large number of reference count updates (i.e., an update “storm”) when the reference counts in lower levels of the reference counting hierarchy are updated. As a result, the traditional hierarchical reference counting schemes utilize a significant portion of the storage system resources such as input/output (IO) utilization, CPU cycles, and memory resources. As such, there are a number of challenges and inefficiencies found in network storage servers that use traditional hierarchical reference counting.