The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely sophisticated devices, and computer systems may be found in many different settings. Computer systems typically include a combination of hardware, such as semiconductors and circuit boards, and software, also known as computer programs. As advances in semiconductor processing and computer architecture push the performance of the computer hardware higher, more sophisticated and complex computer software has evolved to take advantage of the higher performance of the hardware, resulting in computer systems today that are much more powerful than just a few years ago.
As the sophistication and complexity of computer software increase, the more difficulty the software developer experiences in determining the source of exceptions, problems, errors, bugs, or faults in the computer program. Historically, problem determination for procedural languages has been handled by establishing a set of return codes and unique messages. But, today's environment of exception-based object oriented languages and framework-based programs has caused problem determination to become more difficult. Also, adding to the complexity, is the need for always-available web-based applications, which cannot afford to be unavailable while the developer analyzes a problem.
There have been a number of basic approaches to problem determination for web-based applications. One approach is for the customer to recreate the problem in a small test program, which the developer can use to analyze the problem in a laboratory environment. This approach is burdensome for the customer.
In another approach, the customer turns on a trace function in the program to capture more data about the program's state, so that when the failure occurs again, the trace function saves trace data, which the developer can use in problem determination. Unfortunately, this approach has a number of undesirable side effects: first, the customer must recreate the problem; second, the performance of the system may degrade beyond a acceptable point because of the overhead of the trace function; and finally, the performance degradation may change the timing of events within the computer system, which makes recreating timing-related problems more difficult.
Yet another approach is a logging function, which writes state information to a log when an unexpected event occurs in a program. If the customer reports a problem, the developer can examine the logs for hints in diagnosing the problem. One of the undesirable effects of logging is that the logging function uses computer system resources to capture the logged data, and the logged data can quickly exhaust the available resources. Further, in some instances, programs may function correctly, yet they still consume valuable and scarce log space with reoccurring conditions. Finally, the program experiencing the unexpected event may be unable to determine the difference between good and bad exception conditions, so the program does not know when logging would be helpful.
A specific example of a program's inability to distinguish between good and bad exceptions is demonstrated by the following simple example. If a banking application receives an “account not found” exception from a program, information about the request needs to be logged. If requests for non-existent accounts persist, the bank will want to investigate to determine the source of this suspicious activity. In contrast, if an online auction application receives an “account not found” exception from a program, the application may simply programmatically recover by creating the account, perhaps with a user confirmation, as a convenience for the user who wants to bid on an auction item.
The process of determining if exceptions or events should or should not be logged is quite problematic for the developer of the program because the developer may have limited knowledge of the application that will utilize the function in the program that experiences the exception or event. Using the above example, the developer of the accounting program does not necessarily know whether a banking application or an online-auction application will be using the accounting program, and they have quite different exception handling requirements.
Since the developer of the program does not know what application might use the program, and what needs the application might have for handling exceptions, typically the program that originates the exception also routinely logs information about the exception as a matter of course to error on the side of safety, which can create a large volume of log information, which can take the developer much time to later analyze. Yet, the application that invokes the program may understand the exception and programmatically recover from the exception, which makes the logged information meaningless to any future exceptions. Further, the logged information consumes valuable and scarce system resources and may quickly wrap the available log memory, which may overwrite previous logged data, which might have been valuable.
Without a better way to manage exceptions, logged information in response to an exception will continue to be of marginal use and will also continue to consume valuable and scarce system resources, which increases the cost to the customer. Although the aforementioned problems have been described in the context of web-based applications and object oriented programming, they may occur in any environment.