Until recently corporations considered the mid-tier platform, based on such operating systems as Unix and Windows NT, too fragile for hosting mission critical applications for their businesses. With the introduction of Java(R) by Sun Microsystems, Inc., and the widespread use of the Internet to do business, the situation changed. Corporations now use the technologies based on Sun Microsystems Java 2 Enterprise Edition (J2EE) and build critical business applications on the mid-tier platform. Such applications are generally run in a distributed computing environment, with server farms having numerous CPU's.
Since many corporations do business with their customers using the Internet, critical business applications are now exposed to end-users through their browsers. Any downtime or problem with these online applications creates huge direct and indirect opportunity costs. One analyst has stated that a web site must respond within eight seconds or a user will leave and go to a competitor's web site. By one estimate, about $4.35 billion in online sales are lost each year in the U.S. alone due to downtime or slow sites.
A plethora of system management tools are available to monitor the performance of networks, databases, storage devices, and platforms, which together constitute the infrastructure of mission critical business applications. These tools are certainly necessary and have their place in the picture; however, the application connects all of those resources, and it is the performance of the application that directly affects the customer. The application drives the network traffic, database usage, and the platform workload. Consequently, most Internet website outages are application related, and existing system management tools fall short of giving any insight of how applications are behaving. Current off-the-shelf application management systems concentrate mainly on application server functions, and not on applications.
As many applications are directly exposed to customers twenty-four hours a day on the web, stringent availability and performance requirements are imposed. Currently, a data center typically resolves server problems by recycling the troubled application server. Unfortunately, recycling is typically only a temporary solution, as the error almost always reoccurs. The inability to identify a better resolution is due to the lack of production-class non-intrusive tools to service and troubleshoot faulty systems and applications on the mid-tier platform.
Another application management issue is due to the problems created by servicing the production workload. A clear division of labor between development activities and production services activities is a norm in enterprise IT organizations. However a major disconnect exists between data center operations and systems development activities. There is a lack of non-invasive tools that allow administrators to visualize the workload running inside the application server. Diagnosis without any J2EE context, passed from production services staff to development staff, is too low-level to offer developers any problem-solving hints.
Any application level tracing inherently involves high overhead and for this reason is not acceptable to high volume systems. A wide variety of components need to be monitored for diagnostic purposes, including servlets, JSP, enterprise Java beans, objects, methods, SQL, sessions and context. The distributed servers are a major source for the intermittent problems.
Current application management products provide less than optimal functions to handle multiple, distributed applications in a real time fashion. While applications are being integrated, most current systems management products still look at discrete components. This approach makes application troubleshooting difficult, and root cause analysis almost impossible.
In summary, a variety of problems can occur in J2EE application server farms which hamper the performance of e-business applications. The most common of these problems are loops, slow processing, hang situations, stall situations, exception occurrences, intermittent problems, deadlocks, timeouts, API related problems, and memory leaks. Existing environment and trouble shooting monitors are not available for application servers in distributed environments.