Data and information are rapidly becoming the life blood of enterprises. Transactions with customers, operational data, financial data, corporate intelligence data; in fact, all types of information are now captured, indexed, stored, and mined by enterprises in today's highly competitive and world economy.
Since information is vital to the enterprise, it is often made available twenty-four hours a day, seven days a week, and three hundred sixty-five days a year. To achieve such a feat, the enterprises have to implement a variety of data replication, data backup, and data versioning techniques against their data warehouses.
For example, an enterprise may periodically capture the state of its data for a particular volume as a snapshot. If something should happen to the volume, it can be completely restored to the saved snapshot state. Another technique mirrors a volume on multiple volumes, such that if one volume is down or not accessible, another volume is automatically made available unbeknownst to the users. This is often referred to as data replication or failover support.
Today, it is not unusual for the average size of an enterprise's data warehouse to contain several terabytes of data. Yet, snapshot services work on the block level of data granularity for a particular volume and do not typically permit changes to a saved snapshot. Moreover, while a snapshot of volume is processing, access to the volume is permitted to proceed. Otherwise, accessibility to the volume becomes severely impaired. This means that for any given volatile operation that occurs when the snapshot is processing, there is typically (with a Copy-On-Write (COW) approach) three Input/Output (I/O) operations that take place. The first I/O operation is a read of the original pre-snapshot block (origin block). Next, the original pre-snapshot block is written to the snapshot storage (2nd I/O operation) and finally the modifying operation (volatile operation occurring post snapshot against the origin block) is written in the origin block (3rd I/O operation).
One appreciates in this scenario that access to the volume can be severely hampered during snapshot processing, if for every volatile operation three I/O operations have to take place. In fact, in a large transactional-based environment, the volume may even become unresponsive during a snapshot as bandwidth and processing capabilities become fully taxed with all the I/O operations taking place.
To alleviate this, there is one alternative approach in the industry referred to as Write Anywhere File Layout (WAFL). With WAFL, a single I/O operation is performed on an origin block when a volatile operation is made against it during snapshot processing. WAFL works on a file-level granularity and not a block-level granularity. So, any file-level operations with WAFL are translated on the backend to block level operations. Therefore, a single write operation is done on a new and unallocated block associated with the volume. In other words, the write operations are not done in a sequential or serial grouping manner within the storage volume. So, the write operations occur on any available block on the volume; accordingly, fragmentation of the volume becomes a commonplace and a likely scenario with WAFL-based techniques. Furthermore, the performance of other storage volume operations is impaired following a WAFL-based approach because of the volume fragmentation produced with WAFL.
As a result, there is a need for improved snapshotting techniques that use a minimum amount of I/O operations and that minimize volume fragmentation.