This disclosure relates to network communications where logical audit blocks are created at a source host and transferred to a remote host where the audit trail is used to create and maintain a continuously synchronized remote database backup.
A database such as the Unisys Data Management System II, Extended, is a centralized collection of data placed into one or more files. Multiple application programs can access this data concurrently. Consequently, redundant files are not required for each individual application. Application programs running in batch, time sharing, and reload job entry environments can all access the database concurrently. A database of the present configuration consists of the following major components:
(a) Data sets;
(b) Sets;
(c) Subsets;
(d) Data items;
(e) Global data.
A data set, a set, or a subset, that is not an item of another set is termed disjoint. Structures need not be disjoint, that is to say a hierarchy can exist between the various data sets, sets, and subsets. A data set, a set, or a subset, that is an item in another data set, is said to be embedded. When a database contains embedded structures, a hierarchical file structure results.
A data set is a collection of related data records stored in a file in a random access storage device. A data set is similar to a conventional file. It contains data items and has logical and physical properties similar to files. However, unlike conventional files, data sets can contain other data sets, sets, and subsets.
A set is a structure that allows access to all records of a data set in some logical sequence. The set contains one entry for each record in the data set. Each set entry is an index that locates a data set record. If key items are specified for the set, records in the data set are accessed based upon these keys. Otherwise the records are accessed sequentially. Multiple sets can be declared for a single data set, thereby enabling the data in a data set to be accessed in several different sequences. A subset is similar to a set. Unlike a set, a subset need only refer to selected records in the data set. A data item is a field in a database record used to contain an individual piece of information.
Data items that are not a part of any data set are then called global data items. Global data items generally consist of information such as control totals, hash totals, and populations, which apply to the entire database. All global data items are stored in a single record.
The audit trail is a record of changes made to the database. The audit trail is used to recover automatically the database following a hardware or software failure. The audit trail specification clause describes the physical attributes of the audit trail.
The audit trail, as mentioned, consists of a record of changes to the database. It is only created for audited databases and is used in the various forms of database recovery.
An audit trail specification describes the attributes of the audit trail. The specification is optional. If no specification appears, attributes are assigned by default.
All audited databases must include a xe2x80x9crestartxe2x80x9d data set definition. There is a specialized syntax for specifying the audit trail attributes. These involve area size, area length, block size, buffers, checksum, and sections in addition to whether disk or tape is involved and types of tape being used.
The areas, area size, and area length are involved which indicate that disk or pack files are divided into areas. Areas are only allocated as they are needed. Thus, a potentially large file can be small initially and then grow as needed. The user can control the maximum amount of disk space allocated to a file by using the AREAS and AREASIZE (or the AREALENGTH) options.
AREAS specifies the maximum number of areas to be assigned to the file. The maximum value allowed for this is 1,000.
The user can specify the length of an area using the AREASIZE (or AREALENGTH) option. The default option for AREASIZE is BLOCKS. The default value is 100 blocks.
BLOCKSIZE: The records in the audit trail are normally blocked. The user can control the size of a block using the BLOCKSIZE option. BLOCKSIZE can be specified as one of the following items:
(i) SEGMENTS: The maximum value is 2,184 segments. SEGMENTS can define an audit buffer size that is larger than that defined by either the BYTES or WORDS option.
(ii) WORDS: This is the default option. If a User does not define a BLOCKSIZE, the audit trail will use a default BLOCKSIZE of 900 words. The maximum value here is 4,095 words.
(iii) BYTES: The maximum value allowed here is 24,570 bytes.
A Remote Database Backup or RDB is a database recovery system which can be a key component of a disaster recovery plan since it minimizes the amount of time needed to recover from a loss database access. The RDB system also minimizes the loss of productivity, minimizes the loss of revenue and minimizes the loss of business, which could occur because of interruptions in the ability to access one""s database. The RDB works in conjunction with the Data Management System II (DMSII) databases plus Structured Query Language Database (SQLDB), the Semantic Information Manager (SIM) database, and the Logic and Information Network Compiler II (LINCII) databases.
The components of the RDB system consist of a database and also a copy of the database. One database is update capable and the other database can be used only for inquiry purposes. The update-capable database is called the primary database. The host on which this database resides is called the primary host. The xe2x80x9ccurrent on-linexe2x80x9d remote database copy, which is called the secondary database, is xe2x80x9cinquiry-capablexe2x80x9d only. The host on which this database resides is called the secondary host. The configuration of the primary and the secondary databases on their separate hosts is called the RDB System. A single host can participate in multiple RDB systems.
The RDB or remote database backup system enables users to maintain a current on-line inquiry-only copy of a database on an enterprise server, which is separate from the enterprise server on which the update-capable database resides. The host locations can be at the same site or at two geographically distant sites. The remote database backup keeps the database copy up-to-date by applying the audit images from the audited database to the database copy. There is a choice of four audit transmission modes which enables one to choose the means of audit transfer between hosts.
In the RDB system, the term xe2x80x9cprimaryxe2x80x9d and the term xe2x80x9csecondaryxe2x80x9d will indicate the impended function of each copy of the database and the host on which it resides.
The primary database has the function for database inquiry and update, while the secondary database has the functionality useful for database inquiry only.
The secondary database cannot be updated by any application programs and the secondary database is modified only by the application of audit images of transactions performed on the primary database.
Since one complete RDB system is made of one database, and includes the secondary database which resides on another host, that is to say the primary database on one host plus one copy of that database.
A host is the system on which a primary or a secondary database resides. A host can function as a primary host in one RDB system and then also concurrently function as a secondary host for another RDB system. Additionally, one host can function as a secondary host (or a primary host) for multiple RDB systems.
When a RDB system is first initialized for a database, then by default, the primary host is the host upon which the database resides. The other host which is defined for that database is designated as a secondary host and it remains a secondary host until a takeover is performed or until the RDB capability is disabled. Both the primary and secondary hosts must have sufficient resources to support the RDB system and its application environment.
As an illustration, it can be seen how the primary database on a system, which is called Host One and the secondary database is applied on a system called Host Two can work together in response to or in anticipation of an interruption on the primary host. In this example, the application normally runs against the primary database in Host One with the RDB transferring audit images to the secondary database. Under normal operation, which is when the audit images are transferred from the primary database to the secondary database without loss of data during transmission due to network or system failure, the example described above works well. However, in the condition that a network or system failure results in the loss of data during transmission from the primary database to the secondary then the secondary database is said to be out of synchronization with the primary database. Hence there is need of a mechanism by which the lost data can be re-transmitted so that the secondary database can be re-synchronized with the primary one.
The object of the present invention is to expedite and speed-up the transfer of audit blocks from buffers in the primary host for placement into segregated blocks in the secondary host. These segregated blocks of secondary audit data can then be asynchronously moved by a sequence of xe2x80x9cCatchupxe2x80x9d processes working concurrently in parallel to deposit the audit data onto the secondary backup database.
In order to accomplish this object, there is utilized a logical resynchronization process, which hereinafter is referred to as Catchup, which according to the present system consists of multiple backup database system at the remote host. Initially, the backup system, via a Tracker sensing mechanism, recognizes that a resynchronization process is required and then, from its shared database library task (RDB Support Library), initiates a single physical Catchup task for each physical audit file partition in a parallel transfer operation.
AUDIT TRAIL SYNCHRONIZATION: It is of some importance to decide on what is called audit level synchronization that is desired for the remote database backup system. This involves the question of xe2x80x9chow closely must the backup database match its source database? Or to express it in another fashion, how closely synchronized should the secondary database audit trail be a replicate of the primary database audit trail?xe2x80x9d
MODES OF AUDIT TRANSMISSION: The remote database backup (RDB) system provides four specific audit transmission modes that enable the user to regulate whether the transmission of the audit images is to be automatic or manual; whether the transmission of audit images is to be done as individual audit blocks or entirely whole audit files; whether the transmission of audit images can be interrupted, that is to say, suspended or not; and what is to be the degree of audit trail synchronization between the primary host and the secondary host. The focus of the present invention involves the use of one mode designated as the ABW or Audit Block Write mode.
AUDIT BLOCK WRITE (ABW): The secondary audit trail is to be constantly and automatically kept synchronized with the primary database audit trail on a block-by-block basis. The ABW mode enables this type of close synchronization level to occur by (i) handling interruptions to audit transmissions through one of two error handling options; or (ii) initiating a Catchup process for the audit block transfer whenever the usual synchronization level is disrupted. This invention is devoted to the Catchup process.
In the RDB utility, the user can specify the time interval between the detection of a need for the Catchup process and the beginning of that process.
A method and system for enhancing the rate of transfer speed of audit blocks from a primary host to a secondary host in order to establish a secondary host database which will be in synchronization, i.e., accurately duplicate, the audit file data in the source database of a primary host.
When the network connections between a primary host source and a remote secondary target host are interrupted, disabled or have transmission delays, then the secondary host will be out-of-sync with the source audit file data in said primary host. Ordinary transfer rates for sending audit file data from primary to secondary host would be inadequate to develop data file synchronization between primary and secondary hosts.
Thus, when a Tracker sensing means indicates the lack of synchronism, a Catchup program is initiated which then utilizes audit blocks of sectioned audit files for placement in multiple segregated buffers which are transferred to segregated audit blocks in said secondary host. These segregated audit blocks are each handled asynchronously by a sequence of Catchup processes, working concurrently in parallel which operate to expedite the placement of the audit blocks onto the remote secondary database.
By asynchronously receiving multiple logical audit blocks and asynchronously writing them to multiple physical files, the audit trails become synchronized quicker than if said process were performed in a serial mode of operation.