In a traditional software development phase, the testing team or automation testing framework runs all the test cases and reports software defects to the development team. Developers then start to diagnose the problems and try to fix them. For those defects which can only be reproduced in very special environments and related with various kinds of system configuration parameters, environment variables, and various contextual runtime information, first of all developers need to set up an identical environment to reproduce those defects. In most cases, it takes very huge efforts to set up such a kind of environments. Developers need to collect various types of customer/tester environment parameters; and use these parameters to set up a new environment, including installing the operating system of the same version, the target software product, patches and all the related software; adjusting the runtime parameters for building the whole environment (including, e.g., the operating system, JRE, etc.); continuously adjusting the trace levels for any possible related components to gradually collect the desired traces. During the above process, any human errors or slight negligence will cause the failure of problem reproduction, so as to prevent problem diagnosis. This obviously becomes a bottleneck in the process of defect fixing and increases the risk of product delivery in time. If this scenario happens in a customer's environment after that product's release, since developers have to work on the spot, and guide the customer to collect all the required information, this is time-consuming and costly for both parties, and the problem solving delay caused by reproducing the environment set will lead to invaluable loss of the customer's confidence on this product.
In addition, cluster systems, due to their advanced feature of high availability and huge business throughput, are becoming more and more popular and have been deployed in cloud environments. However, due to the complex environment factors, complex task scheduling or coordination among a plurality of nodes of a cluster system, it is more difficult to rebuild the cluster system to reproduce the problems. Moreover, the elasticity characteristic of a cloud supports scaling the cluster system to dynamically change the number of nodes, and thus it is very difficult to capture the instantaneous state of the cluster to rebuild the cluster system for reproducing the problems.
It can be seen that a solution for monitoring the operation of a software product in a cloud environment and rebuilding an operating environment of the software product for reproducing the problems in a cloud environment, so as to diagnose a problem of a software product is needed in the art.