This application is related to a co-pending application, U.S. Ser. No. 09/374,351 entitled xe2x80x9cMulti-Processor System for Data Base Managementxe2x80x9d, and to a co-pending application, U.S. Ser. No. 09/374,351 entitled xe2x80x9cEnhanced System and Method for Management of System Database Utilitiesxe2x80x9d which are incorporated herein by reference.
In the operation of computer systems and networks, the computer data is often xe2x80x9cbacked-upxe2x80x9d, that is to say, it is copied to a storage medium other than the central computer""s storage disk in order to permit the recovery of the data as the data existed at some point in time. This is done for purposes of diagnosis in the event of system failure or inadvertent loss of data.
It is often a standard practice to automatically back-up data on a daily or other periodic basis and store this data on tape or disk.
There are several ways to back-up data for diagnostic and recovery purposes. One way is considered as (i) physical level back-up. The physical level back-up refers to the data as it is stored at specific locations on some physical media, such as a host computer disk.
Another way is (ii) designated logical level back-up. This refers to the data as seen by the user application programs in files or in database tables. Normally, the operating system of the computer will include a file system that does mapping between the physical level and the logical level. On doing the physical level back-up, this would involve making a raw copy from a computer disk to some other storage medium without going through the file system or some other physical to logical interpreter module. Then on the other hand, the back-up using the logical level would involve using such a thing as an interpreter module or some sort of a file system while doing back-up of physical to logical mapping.
A Database Management System consists of a set of tools used to develop and manage a database. The present system utilizes a DMSII which is a Database Management System available on the Unisys Corporation""s ClearPath HMP NX, and the Unisys A-Series systems. A background for the Unisys DMSII systems is available in a publication of the Unisys Corporation, Document 8807 6625 000, entitled xe2x80x9cGetting Started With DMSIIxe2x80x9d and published in September, 1997 by the Unisys Corporation. The DMSII Utilities provide database back-up and recovery capability for the entire database or for partial databases. The background operations of the DMSII utility enhancements are published in a Unisys Corporation publication Document 98037/4 and entitled xe2x80x9cDMSII Utility Enhancementsxe2x80x9d published on Mar. 31, 1999.
Database back-ups can be accomplished for xe2x80x9con-linexe2x80x9d and xe2x80x9coff-linexe2x80x9d bases. The on-line back-up will allow users to update data in the database, whereas the off-line back-up disallows all updates to the database. The back-ups can be done to either tapes or disks or any combination of both types of such media.
Database Management Systems are used by many large and small businesses such as airline reservation systems, financial institutions, retail chains, insurance companies, utility companies and government agencies. The present Database Management System (DMS) in its form as DMSII is used to build database structures for items of data according to some appropriate logical model, such as relational, hierarchical, or network. Further, the Database management System is used to manage the database structures and keep the structures in some stable order while various application programs may be retrieving or changing the data. The present embodiment of DMSII has a data definition language designated as Data And Structure Definition Language (DASDL).
There are various tasks that are performed in database management and these involve (i) monitoring and optimizing database performance; (ii) the use of database control for monitoring multi-program database access; (iii) the function of the data integrity and safety done by integrity checking and preventing access to the same data by multiple applications occurring at the same time; (iv) the function of defining data structures and the data fields within them, including the function of modifying data structures; (v) data access operations and developing an application program to retrieve data or to change data; (vi) the function of data shareability to provide multi-program access without conflicts and provide database definitions to the application program; (vii) in database and data security, to prevent unauthorized database access; (viii) ensuring independence of application programs from certain data changes and preventing the revision of application programs every time a structure changes; (ix) in database and data recovery, performing the resumption of database operations after an interruption; (x) tracking data changes by keeping a record of every change made to the data; (xi) for data change integrity, ensuring that update changes are applied to, or removed from, the database in their entirety; (xii) providing a recent copy of the database as a reserve by backing-up the database and storing copies of audit files and all other database files; (xiii) providing for database scaleability by growing or shrinking the database according to the ongoing needs at the time.
The DMSII provides standard software files that perform services and operations for all the databases connected to the system""s Enterprise Server. This enables a viewing of a list of all these files on the user terminal.
In the ordinary course of operations, the application program user will submit changes to data or retrieve data while running a particular application program. Then, changes can be made which add, modify and delete data. A Database Administrator (DBA) keeps the database running smoothly and enforces the rules for data integrity and security. Users access the database through a given application program which itself does not access the data directly. Instead, the program interacts with the DMSII software and the database tailored software, which is directed by the access routines of the Data Management System to provide accesses, retrievals and the storage of data in the physical database file.
In regard to access, an application user will access the data in order to (i) make an inquiry to get a Read of data in the database, or (ii) to provide an update by doing a write to the database thus, adding, deleting or changing data. The access for either purpose contributes to an operation on the database which is called a xe2x80x9ctransactionxe2x80x9d.
A transaction is a sequence of operations grouped by a user program because the operations constitute a single logical change to the database, At the end and finality of the transaction point, the transaction is complete and without error, and it is considered as being committed to the database.
Actual real world data goes into special logical structures that are used by the Data Management System to store data. The database is designed to map categories of data into suitable structures. For example, the real world data would have a character with a structure called a xe2x80x9cdata setxe2x80x9d. An example of this would be a particular person""s name. Then, real world data that can serve as an index of a whole data set has a structured name called a xe2x80x9csetxe2x80x9d. This, for example, might be the social security number of any employee. Then there is data that can serve as an index of a data set under a certain condition, and this is called a xe2x80x9csubsetxe2x80x9d. This might be an employee""s work number, for example. Then, there is data about each instance of a particular category. The structure name for this is xe2x80x9cdata itemxe2x80x9d. An example of this might be the name and address of the category (person). Then, there is data related to the database as a whole, and this involves a structure called xe2x80x9cglobal data itemxe2x80x9d. An example of this might be the total number of employees in a company. Once there has been identification of the real-world data which is to be stored in the database, it is then necessary to define that data in relationship to the data structures of the data management system that holds data. When this data is defined within xe2x80x9cstructuresxe2x80x9d, then the data management system and the system software programs an application program that can then understand how to make this data accessible for various inquiries and/or changes. This is done with the Data and Structure Definition Language (DASDL).
The Data Management System structures are the building blocks of the Data Management System database. Here, the xe2x80x9cdata setxe2x80x9d has the purpose of storing data pertaining to a data category in a collection of records. A xe2x80x9csetxe2x80x9d has the purpose of indexing all records in a data set. A xe2x80x9csubsetxe2x80x9d serves the purpose to index some records in a data set according to some given criteria. The xe2x80x9cdata itemxe2x80x9d is a structured name which defines a unit of information about a category in a given field (column) of a data set record. A xe2x80x9cglobal data itemxe2x80x9d serves the purpose of storing a unit of information about the entire database or any of its involved structures. In general discussion about the types of data and the names of data structures, it is often seen that in a relational database, a xe2x80x9cdata setxe2x80x9d is called a xe2x80x9ctablexe2x80x9d. A xe2x80x9csetxe2x80x9d or xe2x80x9csubsetxe2x80x9d is frequently called an xe2x80x9cindexxe2x80x9d. A xe2x80x9cdata itemxe2x80x9d is often called a xe2x80x9cfieldxe2x80x9d or a xe2x80x9ccolumnxe2x80x9d, or is often called by its data name, for example, a project number. xe2x80x9cStructuresxe2x80x9d are made of common file components designated as records and fields.
A record is a group of logically-related data items in a file. Often, a record is called a row. Data items reside in different fields in the records. For example, a record might involve a series of data such as an employee""s name, the employee""s I.D., the employee""s social security number and years of employment. A group of such records would constitute a file.
The operating system which uses the data management system will treat the record as a unit. The system makes data available to users in records and not in individual single items of data. In programming languages, the record is the unit of data that the system reads from or writes to a file in one execution cycle of a Read or Write statement in a program.
If the application program wants to change a data item in a given record, the Data Management System brings a copy of the record from the physical storage over to memory, then enables that data item to be changed, and then writes the changed record back to the file.
A xe2x80x9cfieldxe2x80x9d is a consecutive group of bits or bytes within a particular component of a record which will represent a logical piece of data. A field or column is defined by the description of the data item it is to hold. For example, if one field carries the name of an employee, this field in the record could be called the name field.
The xe2x80x9cdata setxe2x80x9d is a physical file, that is to say, a collection of related data records stored on a random-access storage device, such as a disk in which the data resides.
A data set is kept up-to-date in several ways: (i) here, application programs add, change, or delete individual pieces of data or records stored in the data set; (ii) the Database Administrator (DBA) maintains the structure of the data set by keeping the data set within certain maximized limits, by adding, deleting or changing the definition of a data item, creating new sets or subsets, monitoring automatic processes that guard data integrity and creating guard files to enhance the security of the data.
A xe2x80x9csetxe2x80x9d is a separate stored file that indexes all the records of a single data set. The Data Management System uses sets in order to locate records in a data set. A set has no meaning apart from its related data set. The set structure enables an application program to access all records of a data set in some logical sequence.
A xe2x80x9csubsetxe2x80x9d can be considered identical to a set, except that the subset need not contain a record for every record of the data set. A subset is a file that indexes none, one, several, or all of the records in a data set. The subset structure enables an application program to access only records of a data set that meet a particularly required condition.
For example, an application program may compile a list of people who are xe2x80x9cmanagersxe2x80x9d. Thus, it is seen that the database designer created the xe2x80x9cmanagerxe2x80x9d subset. Thus, in order to retrieve a record of managers, the data management system can use the smaller file, that is, the subset, to quickly point to the corresponding records in the larger file which is the data set. As with the set, the subset must also be kept up-to-date.
A xe2x80x9cdata itemsxe2x80x9d is an element of data. In the Data Management System, a data item can also be the field (column) in the database record. For example, the social security number could be considered as a data item in the sample data set designated xe2x80x9cpersonxe2x80x9d. The purpose of the data item is to describe the data to be stored. The data item provides the identityxe2x80x94type, size, location, and attributesxe2x80x94of one element of data for a database entity. When an application submits an update to a data item, the Data management System will accept the update if it corresponds to the definition of a data item. Otherwise, the change is rejected and reported as an exception. The Database Administrator will add, delete or change the data item definitions. There are a number of data items that are used by the Data Management System. These include the type called xe2x80x9calpha-numericxe2x80x9d which include words and characters, names, addresses, dates and titles. Then, there are data items designated as xe2x80x9cnumericxe2x80x9d which involve integers and decimals with or without signs. Then, there are data items designated as xe2x80x9crealxe2x80x9d which involve single precision floating point numbers that occupy one word. An example of this would be, for example, an employee""s salary. Then, there are data items which are called xe2x80x9cBooleanxe2x80x9d which involve TRUE and FALSE values.
The xe2x80x9cglobal data itemxe2x80x9d is a data item, a group item, or a population item that is not part of any data set but still pertains to the database as a whole. Such global data items are stored in one special record called the xe2x80x9cglobal recordxe2x80x9d in the DASDL declaration which is outside the structured definitions. Sometimes the global record is placed just before the structured definitions in the DASDL file. The global data item has the purpose of holding permanent information about the database as a whole or about a particular data set. It also acts as a place holder for information that can be derived from the database.
One of the most significant options in DASDL (Data And Structure Definition Language) is that it is possible to define the database as to whether the database is to be audited. The data management system supports both logging changes to a database (auditing the database) or not logging changes (maintaining an unaudited database). There are advantages in auditing a database since this assures the user that if a database failure occurs, there will be a record of database changes with which one can restore the database to a completely integral state and thus avoid loss of information and corruption of information.
The xe2x80x9caudit trailxe2x80x9d is a log of changes made to the database. This type of audit trail is somewhat similar to the SUMLOG in the host system which is the history of all system activity except for the fact that the audit trail will record the database update activity only and will consist of separate numbered files. Thus the data management system software can use an audit trail to recover the database from an unusable state, provide restart information to user programs, reconstruct portions of the database that had been lost because of hardware errors, back out aborted transactions and roll back the entire database to a user specified point or rebuild the entire database to a user-specified point.
The xe2x80x9caudit filexe2x80x9d provides a chronological history of all update database transactions. The audit file is a numbered segment of the database audit trail where the data management system assigns each audit file to have an audit file number (AFN) in the range of 1 to 9999.
Access Routines Program: The data management system controls access to database data with a software program called Access Routines which is a collection of specialized routines that enables many users to access the database all at the same time and ensures that the access is controlled so that accesses do not conflict with one another.
Control File: Each active data management system database has a control file. The control file contains the time stamps for the database software and files and the access routines since the access routines use time stamps to check the validity of data. A control file also contains the update levels of the database and the structures since the access routines use update levels to check the validity of data. Further, the control file functions to store audit control information, dynamic database parameters plus other information. It further guards the database from interruption while a process that needs exclusive access to the database goes on to complete its task successfully, such as, for example, a halt/load recovery and/or a reorganization. The control file assures that a database that has been interrupted for any reason is not accessed until the integrity of the database is further guaranteed by the successful completion of the recovery process.
I/O Operation: An I/O (Input/Output) operation is one in which the system reads data from or writes data to a file on a peripheral device, such as a disk drive. When there is a failure of a read or a write operation, then this is considered to be a I/O error, which must be handled.
Backup: The most important preventive maintenance task which can be performed for a database is to back up the database frequently and to keep the backups for some period of time. To xe2x80x9cback upxe2x80x9d the database, means to use the data management system DMUTILITY program to make a copy of all or part of the database. This backup will include a check of the physical integrity of all the database""s structures being backed up. A complete database includes providing a reserve copy of all the files pertaining to the database. All the files include not only the database files and the control files (which may change from time to time) but also the DASDL source file, the description file, various tailored files, application programs, and audit files. This enables a user to put the database back in operation quickly in case the current database files should become unavailable or damaged.
Here there is involved the concept of xe2x80x9cDUMP.xe2x80x9d A DUMP involves either a copy of stored data in which a change has been made since the previous DUMP of that data or a transfer of all or part of the contents of one section of computer storage to another section or to some other output device. The processes used to make a database are called backing up and Dumping. A backup to tape is called a Tape DUMP while a backup to disk is called a Disk DUMP.
Often the backing up operation for the database is done by increments. An increment is one of the series of regular consecutive additions, for example, if a database is too large to back up on a daily basis, the operator could create a schedule that backed up a certain number of database files (an increment) each day until the entire database was backed up.
The dump of a database is done to tape or disk depending on what type of storage resources are available. Tapes are most frequently used since they are the less expensive resource than disk. When dumping is done to tape, it is necessary to furnish information common to any disk-to-tape process and this information would include the tape name, the cycle number, the version number, workers, the serial number, compression and non-compression, the density, and the SCRATCHPOOL option.
However, when dumping to disk it is only necessary to specify the file title for the entire dump and the number of DUMP files into which the system should place the DUMP.
Recovering a database means to get it back and ready up to date, ready for access with complete and correct data. The recovery of the database can be done either automatically or be done manually using various software utilities and commands.
The present system and method provides enhancements which accomplish performance improvements in the DMSII database utilities, plus support for new tape devices and other efficient back-up methods. The presently-described system achieves special optimization of Input and Output in both disk or tape operations, plus special DUMP features which enhance the ability to perform database back-up. Of special focus herein is the use of a numerical generator in a routine designated ACCESSROUTINES and a DUMPSTAMP operation which stores a unique number for each data block in the database. Then when any block gets modified, the Data management II software will calculate a xe2x80x9ctranstampxe2x80x9d value for each data block that is modified. This value can then be used to identify changed data blocks when performing a Dump to disk or tape.
As a result, total back-up time has been reduced due to the ability to DUMP the data blocks which have been modified since the last DUMP. Thus, the presently described system will provide users with more options for improving the efficiency of their database administration and operational practices.
A system and method is provided to enhance the ability to perform database back-up using special features designated as the xe2x80x9cIncremental DUMPxe2x80x9d and also a feature designated as xe2x80x9cAccumulated DUMPxe2x80x9d. As a result, the total back-up time has been considerably reduced due to the ability to dump the data blocks which have been modified since the last DUMP. This enables the data recovery process to operate more efficiently due to the lesser number of audit images that are being applied.
The Incremental and Accumulated DUMP procedures involve an interface which is directed to the performance of running DUMP to Tape or DUMP to Disk. Often when DLT tape devices are used, it has been noted that a functional DMUTILITY program is not writing enough data to keep the drive streaming so that repositioning of the tape unit is necessary. This repositioning takes time and then slows-down the DUMP processes.
Due to this type of problem, changes have been made to expand the Input buffer used for reading files from disk and further to increase the block size used for writing out to tape. This tape block size can have a maximum of up to 65,535 words.
A previous procedure designated DMUTILITY was limited by a block size for 903 words for a xe2x80x9cDUMP To Diskxe2x80x9d operation. This has been expanded from 903 words to 20,040 words in order to improve the performance by then reducing the number of Input/Output operations (I/Os) for disk drives. Further, the user can also specify the block size by utilizing a new Block Size clause in the DUMP Command which may be expanded up to 65,520 words.
Similar enhancement is also made for DUMP to Tape where the User can specify the block size by utilizing a new block size clause in the DUMP command which may be expanded up to 65,535 words. The tape density is a required parameter for the DUMP command, and the maximum block size allowed is based on the density value this limitation is imposed by the tape device.
Similar enhancement is also made for DUMP to Tape where the User can specify the block size by utilizing a new block size clause in the DUMP command which may be expanded up to 65,535 words. The tape density is a required parameter for the DUMP command, and the maximum block size allowed is based on the density value where this limitation is imposed by the tape device.