Within many data systems the management of buffer space (e.g., physical memory that is set aside to be used within data transfer processes) is a significant concern, and especially within storage controllers. Memory is required to handle any disk operation, such as performing a write from a host server to a disk in which: a buffer is allocated from storage controller memory; data is transferred from host to the allocated buffer by a direct memory access (DMA) engine; on completion of this first data transfer, a further DMA operation transfers the data thus obtained from the same buffer to the disk; and on completion of the second data transfer operation, the buffer can be freed.
More complex operations are possible in support of enhanced functions. For instance, mirroring involves the same data being transferred to two (or more) disks. A known technique is to allocate a second buffer and then copy the data that needs to be transferred to this second buffer. As each of the two data transfer operations to the disks completes, each frees an independent buffer. However, this scheme has the significant disadvantage that the operation to copy the buffer is expensive, in terms of both memory and bandwidth.
It is possible to perform these disk writes by referring directly to the same buffer. However, each disk-write operation can be subject to a large number of transformations, such as caching, FlashCopy, SnapShot, lookup through a directory, etc. As such, it can become a constraint on the system design complexity to keep track of both limbs of the write operation and identifying when it might be complete.
Other known techniques involve page-reference-counting and copy-on-write. These combine the benefits in terms of flexibility of an independent buffer for each operation, while keeping the operational efficiency of avoiding copying data simply to generate an identical image. These schemes typically include: a unit of memory called a page; a data structure containing an entry for each page (e.g., a page table), where the entry associated with a page contains a reference count; and page references that are given to owners of one of the references to a page and with which a data transfer operation can be performed.
For example, pages are initialized with reference counts all set to zero. Unallocated pages are typically kept on a linked-list of free pages. When a new I/O operation (e.g., a host-write) requires allocation of a page, a page is chosen and its reference count is set to one. A page reference is returned which is used to setup the data transfer to the host.
In order to perform the two disk-writes of a mirrored-write, a copy operation is performed by incrementing the reference count (which becomes two), and creating a new page reference, referring to the same page. The two page references are now each used as part of a disk-write operation to perform the data transfer operation and send data to the disks. As each data transfer completes, each operation frees its page reference, which is performed by decrementing the page reference count. For example, when the first I/O process is complete, the page reference count is decremented by one, from two to one. Then, when the second I/O process ends, the reference count is decremented again, this time to zero. At this point, there are no more references to the page, such that the page can be freed (e.g., returned to the set of free pages).
The page-reference-count technique works well for a mirrored-write operation because, after the initial store of data from the host to the page (e.g., memory), the I/O operations only transfer data out of the page. That is, the mirrored-write only writes to disk(s), and so only needs to read the page (e.g., memory) without writing to the page. As such, the two I/O operations of the mirrored-write can perform their data transfer operations in parallel without risk of disrupting each other's operation.
However, it is often the case that one of the I/O processes accessing memory alters a portion of the memory. For example, a disk-read operation may change the contents of a page. In cases where multiple processes are accessing the same page, it is desirable that a change to the page caused by one process (e.g., a disk-read) not affect another process (e.g., a disk-write referring to the same page). Put another way, each process referring to a page should retain the illusion of having its own independent copy of the memory. The copy-on-write technique accomplishes this goal. According to the copy-on-write technique, a process that seeks to change the page will: allocate a new page from the set of free pages; copy the contents of the currently held page (as recorded in the page reference) to the new page; set the reference count of the new page to one; decrement the reference count in the old page; update the page reference to refer to the new page, rather than the old one; and then proceed to update the contents of the new page, leaving the old page intact.
Using this technique, the updates are only performed to a new page which is not initially shared with any other processes, such that the updates can be performed without risk of impacting those processes that share the old page. Copy-on-write is commonly used in a number of fields, including managing memory as well as disk space.
A reference counted page has a further benefit that it can save memory. For example, the function of two pages of memory can be served by just one physical page. This benefit is often used within virtual memory managers within operating systems to save memory. When an update becomes necessary, a copy-on-write is performed and some of the memory savings are lost. However, this is infrequent enough that the overall memory savings remain significant, and the copy-on-write penalty can be handled by other virtual memory allocation techniques. For instance, when physical memory is scarce, an operating system may cause a process to be paged out to disk (e.g., virtual memory). This frees up physical memory to another process that requires it, either for new pages of memory or to satisfy a copy-on-write operation.
Within storage controllers however, virtual memory techniques are less useful because they can lead to deadlock. Deadlock can occur, for example, when a storage controller runs out of memory mid-way through an I/O process. This can lead to the process being aborted, serialization of processes, etc. Unlike an operating system, a storage controller generally cannot utilize the same technique of paging out to disk if it should run out of memory, since that would multiply the amount of disk operations and add unacceptably to the system cost of performing disk operations when under heavy load. Therefore, within storage controller systems, conventional practice is to use the same amount of memory in a reference counted scenario as if the memory had not been reference counted.
For example, in the scenario where a single page is referred by two operations, there is a second page that holds no data that is retained in the system. This second page is used in the event that either of the two operations needs to perform an update, and hence a copy-on-write operation. In that case, the second page is assigned to that process, and data copied into it, without needing to first allocate a page and consider the possibility that one is not available. Conceptually, these pages are kept in a separate store to cover the reference counted copies. In practice these pages need not be physically allocated; instead, a counter of such pages can be kept and used to constrain allocations from the free pool. The counters describe a portion of the unused memory that is not really free, merely being held in reserve should there be a need to perform copy-on-write.
As is commonly understood in the art, a page is a unit of allocation of memory, and is often 4k in size. It is often the case that disk operations work to smaller units, such as 512 byte sectors, eight to a page. It is possible to have plural I/O operations acting on separate sectors of a page. For example, a host-read operation may transfer from memory to the host from sectors 2 and 3, a disk-write operation may transfer from memory to the disk from the sectors 4 and 5, and a disk-read operation may transfer to controller memory from the disk for all the remaining sectors (e.g., sectors 0, 1, 6, and 7). The storage controller tracks which processes are operating on each page by using data structures, such as buffer descriptors and a page table having a reference count of processes associated with each page.
In this example, however, the disk-read operation can be problematic because it is not sufficient to transfer directly to the page, since that might corrupt the more up-to-date data being sent to disk as part of the other disk-write or host-read operations. Instead, the disk-read is performed by: reading into a separate page; merging the appropriate sub-contents of the previously existing cache page into the new page holding the other portions of the data; swapping the new page for the old page; and freeing the old page reference. This results in decrementing the page reference count, and if the reference count then reaches zero, freeing the old page and returning it to the pool. This is an update of the page from the point of view of the disk-write or host-read operations.
There are a number of algorithms of handling these operations to maintain correctness. For example, all of the operations could be serialized, which is straightforward but would significantly impact the performance of the system. The latency of such operations is a significant concern to host servers, and performing operations in parallel is generally desirable. Alternatively, each operation could perform its own copy of the page, and thus be assured that they have their own reference count. They would each therefore be able to proceed in parallel, and be protected from the disk read operation. However, as stated earlier, to ensure deadlock-free operation, each still needs to allocate a page, in case a copy-on-write became necessary. Running many such operations in parallel increases the amount of memory consumed even though the need to perform copy-on-write is rare. Another alternative way of attacking the problem would be to use a finer granularity of unit for tracking allocations. This would reduce the number of cases where multiple processes contend for the shared resource, as in the example given above. However, such a scheme would have a heavy burden in terms of the detailed data structures required at this level of tracking.
Accordingly, there exists a need in the art to overcome the deficiencies and limitations described hereinabove.