The generation, use and management of Big Data are increasingly important issues. These issues need to be addressed by numerous organisations that obtain, and are required to store, a large and ever increasing amount of data every day. The stored data is used by the server systems of the organisations for a wide variety of purposes, such as conducting transactions and the management of goods and personnel. The server systems need to be both configured to perform specific tasks and also to be able to retrieve the stored data required to perform these tasks from one or more large databases. The failure of a server system, or part of a server system, and the loss of any data, will result in a loss of service. For some organisations, a loss of service for any length of time may be unacceptable and result in considerable financial and other loss to the organisation. In particular, any irrecoverable data loss may cause severe harm to an organisation.
To help prevent losses due to server downtime or irrecoverable data loss, it is normal for organisations to regularly backup all of their server systems. If an organisation's servers and/or databases are damaged due to a disaster occurring, such as a fire or flood, malicious act or human error, then a disaster recovery, DR, operation is performed. A full DR operation will typically involve obtaining a recent backup of the data, providing a replacement server system with appropriately configured servers, recovering the data and using the replacement server system to perform the tasks of the original server system.
A number of problems are experienced by known approaches to the provision of a DR operation.
Due to the very large data storage requirements, the backup of the data in an organisation's databases is usually stored on a plurality of reels of magnetic tape. These are then transported offsite to a vault in a secure location where they are safely stored for retrieval if ever required for a DR operation. A full DR operation from a total loss of data requires performing the manual operations of finding and retrieving the necessary tapes from the vault as well as obtaining, configuring and restoring the data to a replacement server system. The entire DR process may take days, which is an unacceptable loss of service time for most organisations. In many situations the entire DR process has also not been tested so the DR effectiveness of the DR process is unknown. DR is normally performed at either a second data centre belonging to an organisation or at a DR provider who provides syndicated server equipment on a multi-year contract basis.
To achieve faster DR operation, it is also known for organisations to alternatively use one or more disk based backup systems rather than offsite tape storage. The use of disk based backup system can be more expensive than the use of tapes. However, a backup disk allows more of the DR operations to be automated and for parallel restores of multiple server systems. Where disk based backup is used then a second offsite copy is maintained by replicating the disk containing the backup data on the site where the organisation's servers are located to a second site with equivalent disk space, connected to a backup system.
A known data protection and recovery system that is widely used is the IBM® Tivoli® Storage Manager, referred to herein as TSM. It should be noted that TSM is not a full DR system, but only a backup system that permits an organisation to recover their data either onsite or at a DR site. Services provided by TSM include tracking and managing the retention of data from organisations, providing centralised data protection, to assist with the retrieval of previously backed up and archived data and to allow for local site recovery and DR operations at second site. An overview of the services TSM provides, how TSM works and the structure of a TSM system can be found at http://www.redbooks.ibm.com/redbooks/pdfs/sg248134.pdf, as viewed on 12 Sep. 2014.
TSM, and other known data protection systems, provides both tape based and disk based data backup and so experiences at least some of the above-identified problems. Furthermore, whilst suites such as TSM are extremely powerful, their use in an organisation of any significant size quickly becomes very complex and requires active management. Experts are therefore required to configure and manage the data protection system and develop and test bespoke data protection policies and recovery procedures. Known data protection solutions and DR contracts with third party organisations can also be expensive for an organisation. DR resources are being paid for when they may never be needed or even tested.
In addition, in order for an organisation to have confidence that they have an effective DR system in place, it is preferable to be able to test all or any part of a DR system by building one or more replacement server systems with the correct configuration and data. However, to reliably, quickly and easily perform such a DR test in an inexpensive manner is not possible with known DR systems.