CPC G06F 11/079 (2013.01) [G06F 9/542 (2013.01); G06F 11/0778 (2013.01)] | 20 Claims |
1. A method for improving site reliability engineering (SRE) observability by utilizing one or more processors along with allocated memory, the method comprising:
defining a schema in a common manner;
causing any application included across a distributed set of applications to utilize the schema to describe an error associated with a downstream application such that a root failing component associated with the error is always at a bottom error frame in a response;
implementing a common structure for distributed error propagation in a chain of applications across the distributed set of applications in connection with an error message;
generating error logs received from the chain of applications;
storing the error logs in a centralized location accessible by all SRE users and application owners;
calling a corresponding application programing interface (API) to access the error logs from the centralized location; and
automatically implementing a remedial algorithm to correct the root failing component of the error message identified in the error logs.
|