1. Field of the Invention
This invention relates to a system and method for improving identification and retrieval of data from a computer implemented database system, and more particularly to a system and method for identifying and retrieving data from a large engineering database.
2. Description of the Related Art
Database management systems support the definition, retrieval, and updating of data stored in a database. A relational database management system is a particular form of a database system where data is stored in a tabularized form. The database tables consist of sets of rows which share common characteristics. The database is physically stored as pages of data on nonvolatile storage devices such as direct access storage devices. An index can also be stored on direct access storage devices listing a directory for locating specific data which aids in the retrieval of that data. Structured query languages (SQL), a sequential language, has been developed for relational database systems to access the data in a database. A relational database allows a user to modify and access a database by specifying the relationship of two or more tables by writing an expression.
Computer programmers write application programs to access and maintain the data in the database. The application programs are executed by the database management system. The application programs need to be processed by the central processing unit (CPU) of the computer system for execution by the CPU. The SQL statements are extracted from the application program. The SQL statements are used to access the data from the database. SQL statements specify what data is wanted but not how to get the data. Typically, the relational database management system determines the optimal method for accessing and retrieving the requested data. A strategy is deemed optimal in terms of minimizing the resource utilization costs and time.
In order to improve relational database management system performance in evaluating and satisfying queries, it is desirable to exploit the inherent parallelism in multiple CPUs or I/O devices available in the compute system during execution. Parallelism can also be exploited by using multiple CPUs to evaluate the data according to criteria provided by a query, so that total CPU time is lowered. A more complex parallelism operation involves partitioning the query execution plan among CPUs and executing operations in parallel. The query optimizer needs to consider whether a parallel strategy should be invoked when determining the optimal strategy that minimizes CPU time and resource utilization costs.
An example of data stored in a relational database system is semiconductor manufacturing and engineering test data. One problem which arises in standard semiconductor manufacturing techniques is that the various processes take place at discrete locations. Thus it is difficult to track a semiconductor device through the fabrication process, from single crystal to finished product. Such tracking may be necessary for quality control purposes in order to determine the causes of production problems which may result in low yields or circuit defects. Some of the data regarding operating conditions during the fabrication process are intrinsic data, for example, lot numbers, device model numbers or the like. Other data may be extrinsic data, such as production test data, production conditions, or the like. In the various processes, the various lot numbers may be changed, thus making it difficult for a production engineer to track down and solve difficulties in the production process.
Through the production process, testing and manufacturing steps are performed, generating data for each semiconductor device. If a problem arises in the manufacture of the semiconductor device, the production engineer may wish to track the semiconductor devices to determine why the production problem existed and correct the problem before performing additional process steps or shipping the product to the consumer. Also, a consumer may require that each semiconductor device can be effectively traced at each step.
Therefore, there is a need to not only effectively track engineering and production data, but also to efficiently and timely retrieve required data. There is a significant time penalty for extracting large amounts of unit level data from large databases, such as, for example, an engineering database, on the order of forty-eight hours in some cases using a standard query generator. Unit level data includes an individual die before and during assembly and a packaged part after assembly. A unit level view of an engineering database can each combine over 15 tables of data in a very complicated join of all the tables to provide the desired look into the data for retrieval purposes. A massive number of nested loops are processed and data derived from the inner most tables must be successively restricted by data from the outer tables which can lead to a large data set carried from the inner tables to the outer tables and, hence the long query times. The process is also inherently single threaded and can not take advantage of parallel processing techniques. A query generator will also spend a considerable amount of time just determining an optimized plan for retrieving the data.
Another problem encountered in retrieving data from large databases is read consistency during query generation and retrieval of the data, such as, for example, with the Oracle.TM. database. The data to be retrieved should be the data present in the database at the time the query statement is entered and long running transactions must attempt to maintain a read consistent view of all tables. However, since there is continuous loading and archiving of existing data, it is often difficult to maintain a read consistent view of the data. For example, over time as data is loaded into the database, copies of old data are retained for a time in complex multiple rollback segments of the database with the data of interest in more than one segment. If the query generation is time consuming, then the data, from the time the query began, may no longer be in the rollback segments and therefore the resultant data is never retrieved.
One or more of the foregoing problems are overcome and one or more of the forgoing needs are satisfied by the present invention.