The invention relates generally to relational database application software development and testing and, more particularly, to a method for allowing individual user entities (e.g. individual users or user groups) access to tables in the database in an isolated manner, that is, without interfering with each other.
Databases are computerized information storage and retrieval systems that are essential for the operations of many businesses. A database manager, also known as a database management system (DBMS), is a complex and sophisticated computer program that provides a variety of tools for defining, manipulating data in, controlling access to and otherwise managing the database in a variety of ways. A relational database management system (RDBMS) uses relational techniques for storing and retrieving data organized into tables which consist (conceptually) of rows and columns of data. The manner in which data is physically stored in the database is handled by the database manager, and is not the concern of the user.
In the context of the invention, a relevant function of the database manager is executing the programming statements (program code) of an application program written in a language such as SQL (Structured Query Language). SQL is used for three kinds of operations, which have their own respective subsets of SQL. Thus, data manipulation operations are directed by Data Manipulation Language (DML) statements, and include data retrieval and data modification. Data definition operations are directed by Data Definition Language (DDL) statements, and include creating and removing objects such as tables, indexes and views. Data administration or control operations are directed by Data Control Language (DCL) statements and allow a database administrator to maintain and coordinate use of the database, including granting authorities or privileges to user entities to access database objects in specific ways. In addition, there is Job Control Language (JCL) which is used to identify a job to the operating system and to describe requirements of the job.
A database typically has many tables, sometimes numbering in the hundreds, and each table typically has multiple rows and multiple columns. Within each table, there usually is something that uniquely identifies each row, and that something is known as the primary key. If employed, a primary key is defined (as a matter of database design and as part of the table definition) as the column or combination of columns that distinguishes a particular row from all others.
In addition to user tables, that is, tables created by users to contain user data, a database contains a variety of system tables, whose function is to hold certain data that the database manager itself needs in order to manage the database. The system tables are collectively referred to as the database catalog.
A particular relational database management system in combination with which the invention may be employed is known as DB2, a product of International Business Machines Corporation (IBM). DB2 runs on mainframe computers under operating systems such as OS/390, and versions are also available for various personal computer operating systems. The invention, however, is not in any way limited to use in combination with DB2, and may, for example, be employed in combination with other relational database management systems such as Oracle, Sybase, Informix and SQL Server.
A database can be very complex, involving, as noted above, hundreds of tables, and requiring a large data storage capacity when “live,” for example, several terabytes. (A terabyte is a trillion bytes.)
Developing, testing and maintaining a relational database application program typically requires a team effort, involving many individual application programmers (also known as “users” in the context of the development environment) or groups of individual application programmers or users. The terminology “user entity” is employed herein to refer to both individual users and groups of users. Providing an application development and testing environment in which a plurality of user entities access the database, as a practical matter, presents a number of challenging demands, if conflicts are to be avoided. Each stage of the testing process has varying degrees of complexities and demands on coordinating testing efforts.
As a simple example, through DML program statements, during development and testing one user entity may be uploading data to a table, another user entity may be deleting data from the same table, while a third user entity is attempting to test a program that accesses the table. Clearly there are conflicts.
As more general examples, during unit testing, tests are performed on individual program modules to determine if they meet defined specifications. Individuals working alone, typically a programmer, usually perform unit tests. A very limited amount of coordination is necessary for unit testing. String testing advances the process with a series of program modules to confirm that they communicate necessary information between each other. Then the testing process advances to the system level. At various stages, application program modifications are made and regression testing is required to revalidate results. There are additional kinds of testing such as performance, stress, production simulation, and parallel production testing done prior to implementation of an application.
There are a number of demands specific to various stages of testing. In the context of the invention, it can be difficult to keep track of the test data necessary to repeat the tests and re-establish the test data, especially when tests require isolation or a predefined order to the testing schedule.
As another example, different program modules serve different functions and may need to be isolated or scheduled. Examples are purge programs, report programs, and update programs. Testing a report program while testing the update program may produce unpredictable results. Was the reason the reports were wrong due to errors in the report program or just the fact that the update program changed the data before it was reported? Was the reason the update program failed because the purge program removed the data before it could be updated? These can be time consuming and frustrating questions to answer.
In real-world projects, different components of the system are typically in different stages of testing. For example, some program modules might still be in unit test while others are in string test. Allowing more than one group of testers to perform these tests in the same environment on shared data is neither practical nor advised. Different test activities can adversely affect one another. Testing in this way is very error-prone and requires extensive coordination activity that slows project progress and can impede, if not prevent, parallel testing.
Individual unit tests also have the problem of determining whether the program functioned as expected. Reports and screens are fairly easy to verify, but did the data stored go through the expected transformation? Did the employee given the 10% raise have his salary updated by the correct amount in the data store? Was some other employee attribute accidentally updated at the same time?
Using a database management system like DB2 can pose additional demands. The very facilities that guarantee production data integrity make testing more difficult. Thus, the DB2 database manager uses a lock mechanism to ensure that users reading information get accurate data and that only one user can update at a time. For example, assuming a locksize of page is chosen, if two testers (user entities) have data that happens to be on the same page, they could prevent each other from testing. To make matters worse, locks are not released until commit. On-line debugging facilities can hold locks for hours, preventing one or more tests from being able to execute.
The DB2 database manager has a load utility which is often used to re-establish test data. There also are third-party replacements for the DB2 load utility. These loads have to be done table by table and Job Control Language statements developed to accomplish the task. Moreover, while the load utility is executing, the tables are unavailable for other testing work. After loads are done, tablespaces may require image copies or additional utility operations such as the check data utility to make the data available again.
Referential integrity constraints can cause additional consideration as to how the data is saved and loaded, or even which tables are required for testing. Although these constraints are beneficial to the integrity of the application, they will require data in other tables, and consideration in the data loading for test purposes.
There are a number of conventional solutions to these demands, each having its own drawbacks. Conventional solutions generally involve (1) scheduling, (2) creating additional environments (multiple copies of the same tables), (3) various combinations of the above, or (4) modifying program code and adding additional column(s) to tables.
Simply scheduling testing is by far the easiest solution. Thus, Monday is the report program, Tuesday is the update program, Wednesday is the Purge program, etc. Monday is string test #1; Tuesday is string test #2. Monday is system test #1 and Tuesday is regression test #1. The problem is time; given enough time, scheduling would be an easy solution to the problems described above.
A hybrid approach is combining scheduling with key assignment. If key values are assigned to various test processes, at least some testing can occur in parallel. As an example, the report program test uses employees 100 through 200. The purge program uses employees 200 through 300, etc. But, what about the department table? It is keyed by department number. All employees in the 100 through 200 ranges must be assigned to departments 10 through 20 and employees in the 200 through 300 ranges must be assigned to departments 20 through 30. Drawbacks here are that each additional key must be assigned a range for each test group and the relationships propagated properly. Even so, errors can occur in programs that will occasionally cause updates to data that is not intended. DB2 lock contention can still happen in spite of careful choices of key assignments.
Providing multiple environments is sure way to avoid lock contention. Each user entity is given a set of tables. This avoids DB2 load and lock contention problems entirely, but it can be an expensive alternative. It is not unusual for an application environment with 400 “live” tables to grow or perhaps explode to more than 8000 test and development tables. Even with the availability of DB2 alter and migrate tools, this alternative becomes very time consuming for database administrators (humans). Application program modules each have to be bound properly, using the correct DB2 table creator. The same program module can be bound many times in such an environment. If there is a common module that is used in many functions and a change is made to it, the module will have to be bound numerous times and tested everywhere. Another risk of this solution is not properly making a table change to all environments. This could result in application program code being developed against an outdated table definition. In general, when multiple environments are provided, a lot of coordination is required for database changes. The more environments, the more coordination is required.
A variation of using multiple copies of tables is to provide multiple DB2 subsystems. Although this variation is an even more expensive solution, an advantage is that DB2 creators (sometimes referred to as owner) do not have to be changed.
Another approach is modifying program code and adding additional column(s) to tables. As an example, a pre-processor may be employed that reads program source code and adds a WHERE clause to each SQL query to point at a specific data value supplied at execution time. This minimizes the number of environments and usually solves lock contention problems. Unfortunately, it requires modification of application program source code, and these modifications can be quite extensive. It additionally requires that several coding conventions be adopted to set the additional columns properly when a statement to INSERT a row is executed. Pre-processors have to be written and executed to add the WHERE clauses to the SQL source code. When SQL syntax changes are made, changes to the pre-processor have to be made to recognize the new syntax.
A related prior art approach is to create a set of test and development tables corresponding to the “live” database tables. Within each test and development table the same columns are defined as in the original database Table. Another column is added to each test and development table, defined to include a value that is unique for each tester or group of testers (user entity), and also made part of the primary key of the table. In addition, a view is created having the same name and column definitions as the corresponding “live” database table, where the view selects only the rows in the test and development table in which the unique value matches a particular user entity. This gives each user entity its own view, or slice, of the table, which is accessed using application program DML statements. Each slice is completely isolated from the activities of other testers. While an improvement to other prior art approaches, this particular approach still has drawbacks. In particular, the column added to each test and development table is part of the primary key, and therefore must be present. However, the added column is omitted from the view. As a result of this omission, a Data Manipulation Language statement to INSERT a row and accessing only the view would not store a value in the added column of the test and development table. Accordingly, a pre-processor is still required, although not as extensive a pre-processor as in other prior art approaches, to modify the INSERT statements of the application program being developed or tested so as to reference the test and development table for storing a value in the added column which requires a value to be present. Moreover, when development and testing are completed, a pre-processor must again be run to remove the modifications to the INSERT statements of the application program in order to correctly access the corresponding “live” database table.