Distributed file systems offer many compelling advantages in establishing high performance computing environments. One example is the ability to easily expand, even at large scale. Another example is the ability to support multiple unique network protocols. For example, a cluster of nodes operating together to function as a distributed file system can support connections from clients using different network protocols. One storage client can access the distributed file system using the Network File System (“NFS”) protocol, a second using the Server Message Block (“SMB”) protocol, and the third using the Hadoop Distributed File System (“HDFS”) protocol. Not only can different clients access the distributed file system using different protocols, multiple clients of a single protocol can also access the distributed file system.
With the ability to service multiple protocols, and to service hundreds and in some implementations thousands of clients, competition for resources can occur. In addition, beyond client traffic, internal jobs such as file system maintenance also compete for distributed file system resources. One means to slow down the consumption of resources within a distributed file system is to throttle network traffic to and from clients and the file system. However, strictly examining and throttling network traffic may not provide an accurate view of the amount of resources clients are consuming. In addition, while throttling individual users can free up resources for other users or internal processes, it may not be desirable to the user or an administrator who wishes to prioritize one set of file system traffic versus another.
As the distributed file system grows in cluster size, a similar growth in the number of clients and workflows typically also occurs. However, the expectations of individual users remain unchanged, in that users expect adequate performance from the cluster of nodes in performing their workflows. Therefore there exists a need to estimate the impact of current workloads and new workloads to assess the impact on the distributed file system, and allow an administrator or an automated process to manage the performance provided to multiple workflows in a way that provides adequate performance for most if not all users.