This application is a continuation-in-part of, and claims priority to, co-pending non-provisional U.S. patent applications Ser. No. 13/752,147 entitled “Methods and Systems of Distributed Tracing,” filed Jan. 28, 2013, 13/752,255 entitled “Methods and Systems of Generating a billing feed of a distributed network,” filed Jan. 28, 2013, and 13/752,234 entitled “Methods and Systems of Function-Specific Tracing,” filed Jan. 28, 2013, each of which are incorporated, in their entirety, herein by reference. This application is related to co-pending non-provisional U.S. patent Applications Ser. No. 13/841,330, entitled “Methods and Systems of Tracking and Verifying Records of System Change Events in a Distributed Network System,” filed Mar. 15, 2013, and 13/841,552, entitled “Methods and Systems of Predictive Monitoring of Objects in a Distributed Network System,” filed Mar. 15, 2013, each of which are incorporated, in their entirety, herein by reference.
The present disclosure relates generally to cloud computing, and more particularly to systems and methods of monitoring failures in a distributed network system providing cloud services.
Consumers of resources on a distributed network system providing cloud services may, from time to time, request metrics or reports related to the quality of services provided by the service provider. For example, many consumers may have Service Level Agreements (SLAs) in place with services providers that guarantee certain performance or service quality levels. Service availability and responsiveness to service requests may be among the desired metrics.
A cloud service provider may generate reports for customers, which provide metrics related to system availability and responsiveness to requests for system resources. One such metric is often a rate of failure to fulfill customer requests. Typically such reports only provide metrics with respect to server errors, often referred to as Hypertext Transfer Protocol (HTTP) Server-Side Failures and responses. 5XX refers to the HTTP error codes associated with various server failures. For example, 5XX errors may include error 500 (internal server error), error 501 (request not implemented), error 502 (bad gateway), error 503 (service unavailable), error 504 (gateway timeout), etc. One of ordinary skill in the art will recognize the various 5xx server error codes associated with HTTP.
The reliability of such metrics may be of interest because there may be financial consequences associated with failures to meet SLA terms. Also, there may be certain situations, which a customer would consider an error or fault, but which would not be counted as a failure under a conventional failure reporting methods.