One or more aspects of the invention relate to the field of computer science, and more particularly, to a data processing and storage apparatus, a method for operating a data processing and storage apparatus and to computer executable instructions on a computer-readable non-transitory storage medium for continuous data protection in distributed file systems of compute clusters.
The recent growth of data may challenge system architects and administrators in terms of data protection. Large distributed storage systems do not only store data for archiving purposes, but rather, have to deal with a high frequency of data access. For this sake, large scale distributed file systems have been developed in the past to provide for large data contents on the one hand and maximum access rate on the other hand.
A typical extension to a distributed file system is known as the “Data Management Application Program interface”, often abbreviated as “DMAPI”. Detailed information can be found in the document “Systems Management: Data Storage Management (XDSM) API” of February 1997, which is hereby incorporated herein by reference in its entirety. Typically, DMAPI is used for hierarchical storage management applications, like IBM's TSM (Tivoli Storage Manager) for Space Management or IBM's HPSS (High Performance Storage System).
As another extension to large scale compute clusters and distributed file systems, backup systems based on a client-server architecture are known from the prior art.
In a client-server backup architecture, the backup server manages the common backup storage resources as, for example, disk storage and tape storage, used to store the backup data. Additionally, the backup server can store global metadata and statistical information about the backup clients, the backup data and the common backup storage.
In a client-server backup architecture specifically designed for a distributed file system of a compute cluster, backup clients are typically installed in a number of the compute nodes. Then, each of the backup clients can read the entire data from the distributed file system mounted on all compute nodes in the compute cluster. For example, with the distributed file system IBM GPFS (General Parallel File System) a TSM backup client can be installed and operative on each compute node in order to perform backup to the TSM server.
Commonly known backup processing is performed in a scheduled manner. When initiated, the backup clients typically perform several steps for determining changes between file system objects and backup data.
In a “scan” step, the file system is scanned to identify files system objects that require backup. Typically, the “scan” step includes a directory tree walk or, in a more particular situation, an inode table walk.
In a subsequent “compare” step, conventional backup clients typically identify differences in the file systems objects. Typically, three types of differences are distinguished.
The first type relates to file system objects which have been newly created since the last backup. The action to be taken for new files is commonly referred to as “SEND”. A backup operation of the “SEND” type will cause a backup server to add these new file system objects to the backup data.
The second type of backup operation relates to objects which have been updated objects since the last backup. The action to be taken for updated files is commonly referred to as “UPDATE”. A backup operation of the “UPDATE” type will cause a backup server to update the backup data relating to these file systems objects. In some cases, backup operations of the “UPDATE” type cause the backup server to create a new version of the file system object while the previous version is still retained in the backup server.
The third type of backup operation relates to objects which have been deleted since the last backup. The action to be taken for deleted files is commonly referred to as “EXPIRE”. A backup operation of the “EXPIRE” type will cause a backup server to delete the backup data for the respective file system object. In some cases, the latest versions of deleted files in the backup server are retained for a pre-defined time period until they are deleted.
After having determined the type of required backup operation for a specific file system object, the conventional backup clients will issue a backup request of the type for the respective file system object to the backup server.
Finally, the backup request will be processed by the backup server thereby bringing the backup data in a consistent state with respect to the underlying file system object.
It has to be understood, that regardless of the number of backup clients, the conventional backup architecture suffers limitations in that for certain steps at least a considerable amount of backup client allocated processing has to be performed on a single node in the compute cluster. In particular, the conventional “scan” step has to be coordinated by a single backup client in view of, for example, initially starting the “scan” step and finally collecting the results from all backup clients for comparison.
Moreover, the conventional architecture performs the “compare” step to a wide extent on a single backup client for being able to determine all changes, especially when objects have been deleted. In an exemplary situation, a first backup client running on a first compute node may not be able to assert that a specific file system object has not been detected by another backup client running on another compute node. Therefore, this file system object may not be marked for expiration by the first backup client.
By this background, it becomes clear that an increase in file system contents, namely in the number and the size of managed file system objects, may cause prior art backup operations to exceed the typical backup time slots.
As an approach for speeding-up backup operations in specific situations, journaling file systems have been proposed in the prior art. Journaling typically means to immediately track changes in file system objects to a so-called file system journal file. Even though the primary goal of journaling was crash recovery, journaling may also be used for avoiding a “scan” as explained before in the file system backup. In view of this, a major benefit of the usage of journaling for file system backup solutions may be seen in the fact that changes in data and metadata will be collected from the journaling engine and can be used as baseline information for a later backup, thereby avoiding a file system scan and determination of differences in file system objects between the last and the current backup. Consequently, journaling may be used to reduce the overall time for the file system backup. Some popular examples of journaling file systems are known as JFS, JFS2, XFS, ReiserFS, EXT2-4, VxFS.
However, journaling is not available for numerous distributed file systems on high-performance compute cluster as, for example, IBM's GPFS.
Therefore, conventional backup approaches may not take advantage of the parallelism of resources in typical high-performance compute clusters.