Distributed systems, including server farms, web services, and the like, have become increasingly common to provide vast amounts of computing resources. For example, such systems may be utilized to provide a wide variety of services, such as to store and retrieve data (e.g., a storage system), process financial data, route and store email, communicate instant messages, provide authentication services, and output web pages, to name a few. As the amount of computing resources desired to provide these services increases, distributed systems may be “scaled out” by adding additional computing devices thereby providing a flexible topology in which additional resources may be added as-needed.
It is often desirable to measure quality of service (QoS) and other metrics in such distributed systems to obtain an understanding of how the distributed systems are operating and identify performance, availability, and responsiveness issues within the systems. Today, most distributed systems focus on measuring QoS on the server side. QoS is very subjective, and QoS measured at the server typically does not accurately reflect the QoS experienced by a client. Client libraries for distributed storage systems can be complex. For instance, a single call to an API method may result in multiple requests to one or more servers. QoS experienced by the client therefore reflects the quality of the distributed system as a whole, not that of a single server. Another problem with tracking QoS at the server is that the data will not reflect issues with the client library itself which is used for accessing the distributed system. Performance issues or bugs in the client library will reflect negatively on the perceived QoS of the distributed system; however, the server is unaware of these client-side issues.