In the past, when a computer experienced a problem with one of its applications while running online, the computer was taken offline to simulate the problem. However, with the advent of the Internet, computers cannot be taken offline so readily in order to identify the problem. Typically, these computers are running numerous applications and are servicing several requests from different Internet users at any one time. Therefore, it is undesirable for these computers to be taken offline. Rather, it is desirable for these computers to remain operational (i.e., “live”) at all times. Thus, these computers are commonly referred to as “live” systems.
Even if it were allowable to take these computers offline, there would still be problems with diagnosing the problem offline. For example, the problems occurring online are typically related to the loading and unique circumstances of the computer at the time the problem occurred. Thus, if the computer were taken offline, the problem would disappear. In addition, for computers operating in a heterogeneous distributed computing environment, the problem is even more difficult to diagnose offline. These computers in this distributed computing environment may have various architectures and run various operating systems. The applications on these computers may have heterogeneous components that have routines in different instruction sets (i.e., Intel x86, Intel IA-64, Visual Basic (VB) byte code, Java class files, and other Virtual Machine (VM) binary). In addition, the heterogeneous components may be operating on different computers. Thus, it is difficult to generate a test scenario that has the same distribution of applications and components and has the same loading. Therefore, offline testing of computers is not very successful in duplicating and solving problems occurring on computers operating on the Internet.
Until now, there has been no workable solution for analyzing live systems in a heterogeneous distributed computing environment.