Multi-processor computing systems are becoming increasingly more common in a variety of applications. A multi-processor system is one which includes multiple processors, where the processors can be physical processors, logical processors, or a combination thereof. A single physical processor can implement multiple logical processors, as illustrated in FIG. 1, in which one physical processor 6 includes two logical processors 7. In such an implementation, the logical processors generally have some private state, but a portion of the state is shared. Henceforth in this document, the term “processor” is intended to mean either a physical processor or a logical processor unless the term is otherwise qualified.
It is important to ensure that instructions and data are safe for execution in a multi-processor environment. What is meant by “safe” in this context is that processes running concurrently will not operate on the same data, or if they do, they will be synchronized to avoid conflicting with each other. To ensure that instructions and data are multi-processor safe, the various processes implemented by the operating system can be organized into a number of mutual exclusion domains according to their functionality. A “mutual exclusion domain” in this context is a set of one or more processes and is sometimes called a “mutex”. The mutual exclusion domains are defined according to functionality, so that it is not possible for two processes in different domains to operate on the same data simultaneously. Furthermore, generally only one process can execute at a time in each domain (with a few exceptions for operations that are inherently multi-processor safe).
A technique for defining and using mutual exclusion domains is known to have been implemented in network storage servers in the prior art. In that technique, the mutual exclusion domains are organized according to the critical path pipeline of the storage server. The critical path can be described as follows: When the storage server receives a data access request (read or write) from a client over a network, a network element of the storage server sends an appropriate message to the storage server's filesystem (storage manager element), which processes the message to determine where the corresponding data is stored, and which then forwards a corresponding message to a RAID element of the storage server. (Note that a “filesystem”, as the term is used herein, does not necessarily manage data as “files” per se; for example, a filesystem can manage data in units of LUNs and/or individual data blocks, rather than files.) Each of these phases of processing the request is carried out by a different stage in the pipeline; as such, a separate mutual exclusion domain can be created for each stage, e.g., a domain for all network-specific processes of the storage server, a domain for all filesystem-related processes of the storage server, a domain for all storage-specific processes of the storage server, etc.
In certain network storage servers, the different pipeline stages, and hence, the corresponding mutual exclusion domains, tend to have different degrees of processor utilization. For example, in certain network storage servers the filesystem domain tends to have much higher processor utilization (e.g., close to 100 percent) than the network and storage domains (e.g., typically in the range of 20 to 50 percent). The filesystem domain, therefore, tends to be a bottleneck in the critical path of the storage server, thus limiting the throughput of the storage server.
The prior art technique mentioned above, therefore, addressed this problem by allowing some parallelism within a mutual exclusion domain, particularly one associated with the filesystem processes. In particular that technique disclosed creating a new mutual exclusion domain for certain filesystem processes related to operations on user data, e.g., reads and writes of user data. The new domain was defined to include multiple “threads” which were allowed to execute in parallel. Each logical data set (e.g., each file or LUN) of user data managed by the storage server was logically divided into one or more contiguous subsets called “stripes”, and each stripe was uniquely assigned to a separate thread in the new domain. Hence, certain predetermined operations on user data were allowed to operate in parallel if they were directed to different stripes of the data set. However, all other operations had to be serialized.
The prior art technique improved upon earlier technology by allowing a select set of file system operations to run in parallel, without having to make the majority of the filesystem code multiprocessor-safe, and without the need for frequent, low-level synchronization operations. However, it did not take into consideration the significant amount of metadata used by a typical storage server. It has since been determined that processes associated with maintaining and managing such metadata tend to consume a substantial portion of the processing throughput of the filesystem of a storage server. Yet there are many different types of metadata associated with a typical filesystem, many of which have complicated interdependencies. Therefore, it is not practical just to extend the prior art technique to apply to such filesystem metadata.