1. Field of the Invention
The present invention relates generally to data storage, and more particularly to an architecture and approach for using hybrid media in high performance, highly scaleable storage accelerators for computer networks.
2. Description of Related Art
In computing architectures that use externally attached storage such as Network Attached Storage (NAS) or Storage Area Networks (SANs), there is a growing mismatch between the increasing speed of computer servers and the ability of storage systems to deliver data in a timely fashion. The inability of storage systems to keep pace with fast servers can cause applications to stall and result in overall throughput of the system reaching a plateau or regressing under significant load.
An examination of the root causes of this scalability problem reveals a common factor related to latency of fetching data from spinning magnetic disk drives and more particularly, associated with rotation and seek time. While drives can deliver large contiguous amounts of data with an initial latency of 1-5 ms in seek time (moving the drive heads to the correct location on disk) frequent access to non-contiguous data can be of the order of ˜40 ms per access. For datasets that involve a lot of randomly accessed data (such as relational databases), the drive seek time becomes a major bottleneck in delivering data in a timely fashion.
Traditional attempts to solve this problem include adding a hierarchy of RAM-based data caches in the data path. This conventional approach is illustrated in FIG. 1. As shown in FIG. 1, when a compute server 110 attempts to access data from storage system 102 via a network 120, there are typically at least three different caches in the overall data path. A hard drive data cache 108 provides about Mbytes of cache, a storage system cache 106 provides between about 128 Mbytes and 16 Gbytes, and a compute server data cache 112 provides between about 100 M and 2 Gbytes (typical lightly loaded system). While such caches are generally beneficial, certain drawbacks remain. For example, the performance problems mentioned above still occur when the active data set is being accessed randomly or is too large to fit into the caches normally present or when the I/O requirements of the dataset exceed the capabilities of the controller attached to the cache.
There have been a number of attempts to create caching products which try to attack this problem through custom hardware solutions. Examples of this include RAMSAN from Texas Memory Systems (http://www.superssd.com/) and e and n-series products from Solid Data (http://www.soliddata.com/). These products are inadequate because they rely on solid-state disk technology which tends to be both expensive and limited in maximum storage size.
Flash memory is a non-volatile computer memory than can be erased and reprogrammed. It is offered in various forms ranging from memory cards to SATA based drives. Flash memory has unique characteristics which make using the devices a challenge in enterprise computing environments. Most notably, flash memory supports a limited number of write and/or erase cycles, and exceeding this limit can render the device unusable. Also, the write tolerance of a flash memory can be significantly impacted by the size of the write operations performed. Flash devices were traditionally targeted at storage environments where data was not frequently overwritten. For example, flash memory has been commonly used as a server boot device where the operating system is written once and infrequently updated. Cache appliances on the other hand can encounter frequent media writes, both while serving cache misses (on READS) and while processing application WRITES. Also, unlike persistent storage, the contents of a cache device can turn over frequently. Therefore, flash memory has not been viewed as suitable for use in cache applications.