The invention relates to an analysis method and apparatus for a parallel system.
A substantial portion of the life cycle of software development is devoted to testing. The purpose of software testing is to detect errors in programs and, in the absence of errors, gain confidence in the proper functionality of the programs. A basic premise of software testing is that programs are adequately covered once the test cycle is complete. Thus, test cases must be properly selected so that some level of confidence is derived about the reliability of software from a sample set of test cases.
In testing software, particularly software in user systems that are relatively large, the test environment (often at the site of the software developer) is usually quite different from the actual operating environment. For example, in many data warehousing applications, the systems used to run the database management software are multi-node parallel processing systems having tens or even hundreds of nodes. The amount of data stored can be in the gigabyte to terabyte range. In addition, the configurations and architectures of the systems used by different users or customers usually differ.
One of the goals of a database management system is to optimize the performance of queries for access and manipulation of data stored in the database. Given a target environment, a plan is developed for each query to achieve better performance, sometimes referred to as selecting an access plan (query plan, join plan, or strategy) with the lowest cost (e.g., response time). The response time is the amount of time it takes to complete the execution of the query on a given system. The number of alternative access plans for a query grows at least exponentially with the number of relations participating in the query. A cost-based model can be used to compare different methods for doing a unit of work, with the most efficient method (or one of the more efficient methods) selected.
The performance of various access plans differ depending upon environmental factors relating to the hardware and software specifics of a target system (customer system). Differences in target systems usually cause the performance of query plans to differ significantly. One technique to emulate or simulate a target (customer) environment is by using expensive, custom hardware. However, such hardware-based test facilities are usually not cost effective.
A need thus exists for an improved method and apparatus to test target systems.
In general, according to one embodiment, a method of analyzing query performance in a target system comprises receiving information relating to an environment of the target system and storing cost data based on the environment information. The performance estimate for a query is determined based on the cost data.