This disclosure relates to data processing and storage, and more specifically, to data storage systems, such as flash memory systems, that employ thin provisioning. Still more particularly, this disclosure is directed to optimizing thin provisioning in a data storage system through selective use of multiple grain sizes.
NAND flash memory is an electrically programmable and erasable non-volatile memory technology that stores one or more bits of data per memory cell as a charge on the floating gate of a transistor. In a typical implementation, a NAND flash memory array is organized in blocks (also referred to as “erase blocks”) of physical memory, each of which includes multiple physical pages each in turn containing a multiplicity of memory cells. By virtue of the arrangement of the word and bit lines utilized to access the memory cells, flash memory arrays can generally be programmed on a page basis, but are erased on a block basis.
In data storage systems employing NAND flash memory and/or other storage technologies such as magnetic hard disk drives (HDDs), the availability and performance of the data storage system is often enhanced by employing some level of data redundancy. For example, data storage systems often employ one or more arrangements (often referred to as “levels”) of redundant array of inexpensive (or independent) disks (RAID). Commonly employed RAID levels include RAID 0, which employs data striping across a set of RAID disks; RAID 1, which involves mirroring of RAID disks; RAID 4, which implements block-level striping across RAID disks and a dedicated parity drive; RAID 5, which implements block-level striping across RAID disks and distributed storage of parity information; and RAID 6, which implements block-level striping across RAID disks and distributed storage of two independent sets of parity information. The data redundancy provided by the various RAID levels allow the data storage system to recover from various modes of failure, thus improving data availability and storage system reliability.
Thin provisioning is also an important feature in modern data storage systems. Thin provisioning allocates physical storage from a data storage system only as it is needed (i.e., when data associated with a logical address is first written). Thus, instead of allocating physical storage for an entire logical volume at the same time (e.g., when the logical volume is created), individual extents of the logical volume are provisioned in response to accesses to the extents. By reserving portions of physical storage until needed, thin provisioning enables a given amount of physical storage to support a logical storage capacity that is larger than the physical storage capacity that is actually available in the data storage system.
In operation, a sequential write operation is often accomplished by issuing write input/output operations (IOPs) from multiple different software threads executing on a host to the data storage system. The present disclosure recognizes that because the write IOPs are issued from multiple different software threads and may be received by the data storage system in a non-sequential order, the write IOPs will not necessarily appear to be fully sequential to the data storage system. If the data storage system employs thin provisioning, the data storage system may provision physical storage in response to the write IOPs in non-contiguous regions of the address space. The allocation of non-contiguous regions of the address space can lead to inefficiencies in the data storage system.
Further, if the data storage system employs data striping such as that used in RAID 4, RAID 5 or RAID 6, writing a full stripe at a time is preferred. Writing full stripes rather than partial stripes is preferred because partial stripe writes require multiple operations to extract the old data and parity from a stripe, combine the old and new data, and then write the new data and new parity to the stripe. For RAID arrays having wide stripes, a large amount of data must be transferred sequentially to form an entire RAID stripe. In such cases, if thin provisioning employs address space extents smaller than a full stripe, then all or most of the address space may be allocated in a way that prevents full stripe writes.
U.S. Pat. No. 7,802,063 discloses thin provisioning in a data storage system employing RAID. At col. 9, lines 11-31, U.S. Pat. No. 7,802,063 discloses:                As shown in FIG. 4, the available physical capacity of the computer 2 is made up of a number of hard disk drives 4A-4D. The available physical capacity is divided into a number of unique, equally sized areas, called territories. The available physical capacity is further subdivided into units referred to herein as provisions. Provisions comprise unique, equally sized areas of the available physical capacity and are smaller in size than the territories. In particular, according to a preferred embodiment, the provisions are one megabyte (“MB”) in size while territories are one gigabyte (“GB”) in size. Accordingly, each territory includes one thousand and twenty-four provisions. It should be appreciated that provisions of other sizes may also be utilized, and multiple provision and territory granularities may co-exist in the same server. By subdividing the available physical capacity of the computer 2 into areas of different sizes, the territories and provisions, the physical capacity may be provisioned in units of different sizes when appropriate. Capacity may be provisioned in units of territories in response to new writes being received at a logical volume. Capacity may alternately be allocated in units of provisions when appropriate.        
U.S. Pat. No. 7,802,063 further discloses a technique for write gathering at col. 14, line 39 through col. 15, line 16, stating:                In a system that utilizes thin provisioning together with RAID, however, regardless of the randomness in the order in which new write requests are received by the server, the allocation of provisions, and consequently the arrangement of the data on the RAID array, is sequential. Hence, a cache on the physical LBA level is always guaranteed to collect new I/O operations together into sequential stripes, which may be written without incurring a RAID 5 write penalty.        
However, U.S. Pat. No. 7,802,063 does not address when allocation of the different sizes of units of physical capacity are appropriate. Further, according to the technique of thin provisioning disclosed in U.S. Pat. No. 7,802,063, the physical capacity is allocated when new writes are received at a logical volume. The present disclosure recognizes that performance of a data storage system employing thin provisioning and data redundancy can be optimized by intelligently allocating varying grain sizes of physical capacity after new writes are received based on analysis of access patterns.