A substantial problem in the data storage art has been how to make data storage performance keep up with the processing performance of computers to achieve efficient systems. Efficient systems in the field of data storage for computer systems generally refers to those where all major components are used in a proportional manner under normal workloads. That is, the computer system and its associated data storage device are optimally operating each at their peak capabilities. The invention, and associated background described here generally relates to persistent storage, such as disks of various kinds, and not the short term storage (usually referred to as Random Access Memory) that is embedded in computers. Currently, the limiting factor is the storage performance, as computer systems and their associated central processing units have far surpassed the speed and efficiency capabilities of these data storage systems.
Prior art solutions of how to improve data storage performance has been to make storage, and the connections between the computers and storage, faster. Examples of these include various ways of aggregating storage, such as RAID striping, improving raw performance of the storage controllers, adding caches in the storage controller as is done with most RAID controllers, in the storage appliance, or on the network server in front of the storage, and distributing the storage activity load unto multiple storage nodes.
There has also been a strong trend towards centralizing storage to ease management, as best exemplified in the emergence of SAN (Storage Area Network) and NAS (Network Attached Storage) systems for organizing and aggregating storage. The infrastructure model related to these solutions (faster and more centralized storage) can be described as a flow graph of a large number of applications running on computers connected by a network to the storage system.
In such a model it is clear that in order for the storage system performance to match the potential performance of the computers, the individual network performance (higher bandwidth and lower latency) between the computers and the storage system has to increase to enable a balance between storage system performance and computer performance.
The problem is that the potential load offered by even a very small number of computers is much higher than is practical for an economical network or central storage system to service. A computer's internal network, that is its bus, operates at one or two orders of magnitude faster speeds and higher capacities than the external networks computers generally support.
Certain prior art solutions include the use of storage accelerators attached to the storage device, such as those forming caching or tiering functions to have the network performance at a central storage meet the performance of the storage itself. Other attempted solutions to this problem have been experimented with in the context of Linux kernel facilities, with several block based implementations, for example bcache, fastcache, dmcache; and with a particular implementation intended for modified filesystems, known as FS-cache. There is also a Windows™ facility with related functionality called BranchCache, which is designed for read-only caching over wide area network links.
It is therefore an object of the invention to provide a novel system and method for improving the efficiency of data storage systems.