The present invention relates generally to storage systems and, more particularly, to system and method to maximize server resource utilization and performance of metadata operations.
Distributed file systems and parallel file systems involve a plurality of servers cooperating with each other in order to complete the processing of file system requests from clients.
In one consideration, a parallel file system such as pNFS (parallel network file system) includes a plurality of Data Servers (DSs) to process read/write requests while a dedicated Metadata Server (MDS) processes all metadata requests. A client first establishes connection to the MDS. Then it performs a file open operation on the interested file to obtain the location information such as IP address of the DS, file identifier on the DS, etc. After knowing the location information and the identifier, the client sends read/write requests directly to the DS. It is the MDS's responsibility to obtain file identifiers from all the DSs as part of the operations such as file open and file create. Hence for certain metadata operations, there is a need for MDS-to-DS communication typically called the Control Path Protocol (CPP). While processing such operations, existing systems block the thread servicing an operation during the CPP procedure, and hence the resources (e.g., CPU, memory, etc.) assigned to the thread cannot be utilized to service other operations. This leads to under-utilization of MDS resources and thereby reduces the overall metadata access performance by a single MDS.
Although separating metadata and read/write service capabilities to MDS and DS respectively greatly improves read/write performance by providing high throughput parallel I/O (HPC applications and streaming applications leverage such architectures), a typical HPC workload contains more than 50% of metadata operations. Hence, MDS server performance is critical in improving overall file system performance as seen by the clients. Virtualized multiple metadata server cluster solutions have been proposed to provide distributed metadata service to increase overall metadata access performance. However, even in such a solution, each MDS is underutilized during CPP communication. Thus, there is a need to provide a solution to effectively utilize MDS resources during CPP communication.
In another consideration, multiple MDS solutions which provide global namespace and a virtualized view of MDSs need MDS-to-MDS communication for certain metadata requests such as directory create and directory listing. As an illustration, in some multiple MDS solution where metadata distribution is at the directory level, a create directory operation may need to create the directory at another MDS other than the one receiving the create directory request. During such MDS-to-MDS communication, threads block as aforementioned and leads to underutilization of MDS resources.