In modern computer systems, data may be stored on a physical data storage device, such as a hard disk drive. For example, a user may store data to a 1 terabyte (TB) disk drive that is divided into 512 byte sectors. The size of a disk drive may be determined by its physical characteristics, and the number of sectors it is divided into may be determined by a disk controller at the time of formatting. After formatting, the disk controller may be responsible for receiving and processing write requests to the disk drive for writing data to one or more sectors.
Multiple users may share a single disk drive, while maintaining per-user data segregation, by dividing physical storage into multiple logical disk drives, or partitions. For example, the 1 TB disk drive mentioned above may be divided into two partitions of 0.5 TB each. Each user may be assigned a different 0.5 TB partition. Thus, multiple users may store data to separate logical drives while sharing the same physical storage. This configuration may be a more efficient use of physical storage resources because separate physical drives need not be assigned for each user.
Similarly, a single user may store data on multiple physical disk drives (i.e., a pool of disks) if the user requires more storage capacity than can be provided by a single disk drive or if disk drive redundancy is desired. For example, a storage pool may include five 1 TB disk drives. However, because it may be cumbersome for the user to manage multiple disk drives, physical storage may be abstracted in a variety of ways for representation to the user. Each type and/or level of abstraction of physical storage described below is based on a storage array containing multiple physical data storage devices.
A data storage array (herein also referred to as a “disk storage array”, “disk array”, or “array”) is a collection of hard disk drives operating together logically as a unified storage device in order to simply and flexibly store large quantities of data. Storage arrays are typically managed by one or more storage array processors (SPs) for handling allocation and input/output (I/O) requests. SPs and associated memory may be implemented on blades, which are plugged into slots in one or more shelves. The slots and shelves may be connected together via a communications bus. An exemplary hardware platform for housing a storage array, SPs, memory, and a communications bus is the CLARiiON® platform available from EMC Corporation of Hopkinton, Mass.
As mentioned above, disk arrays may include groups of physical disks that are logically bound together to represent contiguous data storage space to client applications. As used herein, the terms “client”, “host” and “application” are used to refer to any application that initiates I/O requests to the storage array. At a first layer of abstraction, multiple disks within a disk array may be combined to form a redundant array of inexpensive disks (RAID) group, which are created by logically binding individual physical disks together to form the RAID groups. RAID groups represent a logically contiguous address space distributed across a set of physical disks and are managed by a RAID controller. Data stored in a RG may be spread across the address space of the RAID group, along with parity information, depending on the RAID configuration or level.
Once multiple physical disks have been combined to form a RG, the RG may then be divided into smaller portions, called “logical units,” which may be allocated without regard for the physical division of the storage (i.e., each disk). Applications access and store data incrementally by use of logical storage array partitions, known as logical units (LUs). LUs are made up of collections of storage blocks of a RAID array and are exported from the RAID array for use at the application level. An LU may be identified by a logical unit number (LUN). Therefore, LU and LUN may be used herein interchangeably to refer to a logical unit.
Much like a disk controller is responsible for receiving and processing write requests to a physical disk drive for writing data to one or more sectors, a provisioning layer may be responsible for managing write requests for writing data to a LUN. The provisioning layer may also determine when and how to allocate LUNs from the disk array to a user for storing data to the array.
A first method for provisioning LUNs includes allocating physical storage, before any write requests are received, into fully provisioned LUNs (FLUs). Similar to the partitioning of a single physical disk into multiple logical disk drives, the total size of FLUs provisioned for users may not be greater than the total physical storage available.
For example, an exemplary data storage array may include five 1 TB disk drives combined into a single 5 TB RG. Assuming that a first user needs 3 TB, and each of users 2-5 only need 0.5 TB, fully provisioning the data storage may include dividing the 5 TB RG into one 3 TB FLU and four 0.5 TB FLUs. Thus, user 1 may be presented with a virtualized data storage entity of 3 TB and users 2-5 may be presented with 0.5 TB virtual data storage entities. It is appreciated that at the physical layer, the FLUs may be distributed across a minimum number of disks, may be distributed evenly across all five disks, or may distributed unevenly across any portion of the disks. Moreover, it is possible for the same physical data storage to be shared by multiple users.
A second method for provisioning LUNs includes allocating physical storage at the time data is written. This allows for the possibility of overprovisioning physical storage resources because all that is provisioning need not be allocated, or even physically exist, until it is needed. Overprovisioning of physical storage may be accomplished by further virtualizing LUNs into one or more virtually provisioned LUNs (VLUNs) which allow for over, or “thin,” provisioning, of physical storage. As used herein, a VLUN is an abstraction of a disk device for which the underlying physical storage is allocated at the time data is written. Unwritten regions of a VLUN do not have physical storage associated with them and should contain zeroes.
For example, if the 5 TB RG described above were virtually (i.e., thin) provisioned, user 1 may be provisioned a 10 TB VLUN (TLU), and user 2-5 may each be assigned 2 TB VLUNs. Thus, the total amount of virtual data storage may be larger than the total physical data storage capacity (i.e., 18 TB VLUN; 5 TB physical storage). As a result, the mapping between the virtual layer and the physical layer need not be 1:1. In order to correctly store data to physical storage designated in a client write request, a virtual provisioning layer may map between LUNs and VLUNs. For example, the virtual provisioning layer may be located between a client layer and a physical layer and configured to intercept client read and write requests.
Because VLUNs are not allocated physical storage at the time of their creation, physical storage is allocated at the time data is written to a VLUN by an application. Thus, when a write request to a VLUN is received, the virtual provisioning layer may determine whether the sector(s) to be written to are associated with a new block. If so, a new block is allocated. For example, the minimum size of a block that may be allocated by the virtual provisioning layer may be 8 kB. Next, it may be determined whether the sector(s) to be written have a size less than the entire block. If so, fill data may be inserted into the remaining sectors of the block in order to prevent errors in the event that an application read data from one of these sectors. Because this process of determining whether inserting fill data for a newly allocated block in response to receiving a write request to a VLUN is required is typically performed in buffer memory, it may be referred to herein as “buffer expansion.”
Buffer expansion is necessary when there is a discrepancy between the minimum allocation of a new block by the virtual provisioning layer and the sector size included in a write request. In other words, because a VLUN is presented to the client as a disk device being divided into 512 byte sectors, much like a physical disk device, write requests are typically directed to 512 byte aligned addresses. However, if the sector(s) to be written require allocation of a new block and would not fill the entire block with data, then the virtual provisioning layer allocates the minimum that it can, which is larger than the amount needed for the data write. The remaining portions of the block not containing data are filled with fill data, such as zeroes, in order to prevent possible errors.
One problem associated with conventional methods for inserting fill data as a result of buffer expansion for a VLUN is that an unnecessary number of separate write requests may be sent to physical storage. Because zerofill and data write requests are separately executed to likely the same physical storage, an unnecessary increase in the number of disk accesses and resources associated with these disk accesses is produced. If an application writes less than 8 kB of data and that data does not begin and end on an integer multiple of an 8 kB boundary, plural write requests must be executed to fill the 8 kB boundary. Moreover, because each disk access may require a certain amount of time for the read/write head to seek out the appropriate physical block on the disk, time and energy is wasted. It is an object of the subject matter described herein to reduce the number of write requests sent to physical storage associated with performing conventional buffer expansion on data writes to VLUNs at a data storage array.
Accordingly, there exists a need for methods, systems, and computer readable medium for optimizing the number of data writes to virtually provisioned logical units of a physical data storage array.