Computer systems (i.e., computers) often share storage devices, particularly in enterprise applications, such as over a storage area network (SAN) or using network attached storage (NAS). Sharing storage allows maintenance operations such as backing up data, ensuring hardware redundancy, and so forth to be consolidated for ease of maintenance by information technology (IT) personnel. In addition, shared storage allows computer systems to share data efficiently by storing data in a common location accessible to each computer. Shared storage also increases storage capacity utilization since multiple computer systems share the storage space. Shared storage also enables the possibility of balancing the workload. Since storage is shared, computing tasks can be moved between one computer and another when one computer is overloaded and another is underutilized. Typically, the storage is persistent, meaning that its content survives power failures. The storage is also page-oriented, meaning that the storage device divides data into fixed-size page frames and supports operations to read or write a page in a given page frame. Typical examples of today's storage technology are magnetic disk and flash memory.
Computers access storage devices via a controller, which is usually a special-purpose device whose only function is to read and write data on the storage device. To read from or write to storage, a computer sends a read operation or a write operation to the controller. Currently, the most popular interfaces to storage are based on disk standards, such as SCSI, SATA, and PATA. These interfaces allow the computer to read or write a page frame at a given address.
One popular approach to updating shared storage is to treat storage as a sequential device, where new data is added to the end of a populated region of storage. This is sometimes called “log-structured storage,” because it treats storage as if it were a sequential log of pages, even though the underlying technology can support random writes. Log-structured storage is a useful technique for flash memory for two reasons. First, it avoids having to overwrite a page frequently, which is expensive for flash memory because it requires erasing the multi-page block that contains the page, which in turn implies that the controller saves other useful pages in the block elsewhere before erasing the block. Second, it helps “wear leveling.” That is, it helps ensure that all blocks of storage are erased and rewritten the same number of times. Log-structured storage is also useful for magnetic disks because disks can write data sequentially at a much faster rate than they can write data randomly.
Flash memory and other solid-state memory devices are becoming more popular for use in storage systems due to increased reliability and reduced energy usage due to a lack of mechanical moving parts when compared to disk-based devices. In addition, flash memory can perform random read and write operations at a much higher rate than magnetic disks. The lower latencies due to increased I/O performance also motivate more streamlined synchronization mechanisms. These and other characteristics that flash memory has are quite different than disk-based devices and affect strategies for storing data to flash memory.
One limitation of flash memory is that although it can be read or programmed (i.e., written) a page or partial page at a time in a random access fashion, it can only be erased a block at a time (where each device defines the block size). Where the description herein describes reading and writing a page at a time, those of ordinary skill in the art will recognize that similar principles apply to reading or writing partial pages. Starting with a freshly erased block, a program can write any location within that block. However, once a bit has been set to zero, only by erasing the entire block can it be changed back to one. In other words, flash memory (specifically NOR flash) offers random-access read and write operations, but cannot offer arbitrary random-access rewrite or erase operations. A location can, however, be rewritten as long as the new value's 0 bits are a superset of the over-written value's. For example, an application may erase a nibble value to 1111, and then write the nibble as 1110. Successive writes to that nibble can change it to 1010, then 0010, and finally 0000. In practice, few algorithms take advantage of this successive write capability and in general, applications erase and rewrite the entire block at once or choose a fresh block for writing.
Another limitation is that flash memory has a finite number of write-erase cycles. Most commercially available flash products are rated to withstand around 100,000 write-erase-cycles before the wear begins to deteriorate the integrity of the storage. This effect is partially offset in some chip firmware or file system drivers by counting the writes and dynamically remapping blocks in order to spread write operations between sectors, a technique called wear leveling. Another approach is to perform write verification and remapping to spare sectors in case of write failure, a technique called bad block management (BBM). For portable consumer devices, these wear management techniques typically extend the life of the flash memory beyond the life of the device itself, and some data loss may be acceptable in these applications. For high reliability data storage, however, it is not advisable to use flash memory that has been through a large number of programming cycles. This limitation does not apply to read-only applications such as thin clients and routers, which administrators typically write to once or at most a few times during their lifetime.
Synchronization is a common problem in shared storage systems. It is desirable for each computer system to be able to write when it wants to and read data stored by other computer systems. If multiple computers are allowed to write to log-structured storage, then synchronization is used to ensure consistency with write operations. The synchronization ensures that two computers do not write to the same page frame, which would cause one of the write operations to be overwritten. In the case of flash memory, a write operation that attempts to overwrite a page frame would be lost, since the page frame can only be written once. Synchronization also ensures that there are no holes in the sequence of written page frames. In addition, computer systems may cache data stored in the storage system for faster local access, and the storage system performs steps to ensure cache consistency based on the actions of each computer system.