Information drives business. Companies today rely to an unprecedented extent on online, frequently accessed, constantly changing data to run their businesses. Unplanned events that inhibit the availability of this data can seriously damage business operations. Additionally, any permanent data loss, from natural disaster or any other source, will likely have serious negative consequences for the continued viability of a business. Therefore, when disaster strikes, companies must be prepared to eliminate or minimize data loss, and recover quickly with useable data.
Companies have come to rely upon high-availability clusters to provide the most critical services and to store their most critical data. In general, there are different types of clusters, such as, for example, compute clusters, storage clusters, scalable clusters, and the like. High-availability clusters (also known as HA clusters or failover clusters) are computer clusters that are implemented primarily for the purpose of providing high availability of services which the cluster provides. They operate by having redundant computers or nodes which are then used to provide service when system components fail. Normally, if a server with a particular application crashes, the application will be unavailable until someone fixes the crashed server. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as failover. As part of this process, clustering software may configure the node before starting the application on the node. For example, appropriate file systems may need to be imported and mounted, network hardware may have to be configured, and some supporting applications may need to be running as well.
Many distributed computer systems utilize a centralized shared storage system for their provisioning. Thin provisioning is a mechanism that applies to large-scale centralized computer disk storage systems, SANs, and storage virtualization systems. Thin provisioning allows space to be easily allocated to servers, on a just-enough and just-in-time basis.
Thin Provisioning, in distributed computing systems using a shared storage environment, is a method for optimizing utilization of available storage. It relies on on-demand allocation of blocks of data versus the traditional method of allocating all the blocks up front. This methodology eliminates almost all whitespace, which helps avoid the poor utilization rates, often as low as 10%, that occur in the traditional storage allocation method where large pools of storage capacity are allocated to individual servers but remain unused (not written to). This traditional method is often called “fat” or “thick” provisioning.
With thin provisioning, storage capacity utilization efficiency can be automatically driven up towards 100% with very little administrative overhead. Organizations can purchase less storage capacity up front, defer storage capacity upgrades in line with actual business usage, and save the operating costs (electricity and floor space) associated with keeping unused disk capacity spinning.
Previous systems generally required large amounts of storage to be physically pre-allocated because of the complexity and impact of growing volume (LUN) space. Thin provisioning enables over-allocation or over-subscription.
A volume manager is often used to manage large-scale centralized computer storage systems. However, problems exist where, in such systems, the thinly provisioned arrays change in size and grow. A thinly provisioned array can fail write requests to space that has not yet been allocated to physical storage if it runs out of unallocated physical storage while the allocating write request is cached by the file system. If this allocation failure affected a data write, data may be corrupted or lost. If this allocation failure affected a metadata write of the file system, the file system may get marked for a file system check. Depending upon the size of the file system, a full file system check can take hours and also result in data corruption or data loss.