1. Field of the Invention
The embodiments of the invention generally relate to methods of automating runtime failure analysis for a computer application operating within a runtime environment.
2. Description of the Related Art
Application problem diagnosis in complex enterprise environments is a challenging problem, and contributes significantly to the growth in IT management costs. While application problems have a large number of possible causes, failures due to runtime interactions with the system environment (e.g., configuration files, resource limitations, access permissions) are one of the most common categories. Troubleshooting these problems requires extensive experience and time, and is very difficult to automate.
More specifically, since the advent of the notion of “total cost of ownership” in the 1980s, the fact that IT operation and management costs far outstrip infrastructure costs has been well-documented. The continuing increase in IT management costs is driven to a large extent by the growing complexity of applications and the underlying infrastructure (J. -P. Garbani, S. Yates, and S. Bernhardt, The Evolution of Infrastructure Management, Forrester Research, Inc., October 2005). A significant portion of labor in these complex enterprise IT environments is spent on diagnosing and solving problems.
While IT problems that impact business activities arise in all parts of the environment, those that involve applications are particularly challenging and time-consuming. In addition, they account for the majority of reported problems in many environments and across a variety of platforms (H. Huang, R. Jennings, Y. Ruan, R. Sahoo, S. Sahu, and A. Shaikh, PDA: A Tool for Automated Problem Determination, In Large Installation System Administration Conference (LISA 2007), Dallas, Tex., December 2007).
Many factors can cause incorrect application behavior, including, for example, hardware or communication failures, software bugs, faulty application configurations, resource limitations, incorrect access controls, or misconfigured platform parameters. Although some of these are internal to applications, i.e., bugs, failures are more commonly caused when an application interacts with its runtime environment and encounters misconfigurations or other types of problems in the system (.H. J. Wang, J. C. Platt, Y. Chen, R. Zhang, and Y. -M. Wang, Automatic Misconfiguration Troubleshooting With Peerpressure, In OSDI'04: Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation, pages 17-17, Berkeley, Calif., USA, 2004, USENIX Association).
Troubleshooting these problems involves analysis of problem symptoms and associated error messages or codes, followed by examination of various aspects of the system that could be the cause. Application programmers can leverage signal handlers, exceptions, and other platform support to check for and manage system errors, but it is impossible to anticipate all such failures and create suitable error indications (J. Ha, C. J. Rossbach, J. V. Davis, I. Roy, H. E. Ramadan, D. E. Porter, D. L. Chen, and E. Witchel, Improved Error Reporting For Software That Uses Black-Box Components, In PLDI '07: Proceedings of the 2007 ACM SIGPLAN Conference On Programming Language Design And Implementation, pages 101-111, New York, N.Y., USA, 2007, ACM). As a result, solving these application problems requires a great deal of experience from support professionals and is often ad-hoc, hence it is very difficult to automate this process.