A common architecture for enterprise computing systems includes a client computer and a storage system, where the client system performs most of the computational intensive tasks using applications programs and the information needed for the computation was retrieved from a storage system. Often the storage system was not directly attached to the computer. The connection between the two system components depended on the data storage concept and is often described as a SAN where data was stored as fixed sized blocks and as a NAS where data was stored as files.
The storage component has been typically comprised of a server computer and a plurality of hard disk drives for the actual data storage. Files are often used for unstructured data such as images, text and the like, whereas block storage has been associated with, for example, database processing. In all of these applications, the response time of the storage component of a computing system has been limited by the mechanical performance of the associated disk drives. Mixed workloads, such as a combination of block and file applications, were avoided due to unfavorable interactions between the access requirements.
Flash memory arrays are replacing disk storage devices in many applications due to the more rapid response time to client requests for reading and writing data as well as a capability to perform a much higher number of input/output (I/O) operations per second. However, at present, the hardware cost of the flash memory is greater than that of disk and the flash memory is perceived to have a wear-out problem, at least if not properly managed. The amount of data to be stored on a global basis appears to be growing substantially, despite all efforts to restrict this growth, and more efficient data storage techniques have been developed such as data de-duplication and data compression. Cost and performance are important considerations in the design and economics of data storage systems.
With disk storage systems, post-storage deduplication may be performed; the data to be written to the storage device is usually first written to a disk and may subsequently de-duplicated or compressed so as to optimize disk performance. Often the deduplication is not performed until the data is backed up; the process is costly both computationally and in terms of volatile metadata storage. However, the wear-our characteristics of flash memory have led to the performance of these data reduction techniques in-line in order to reduce the amount of data that is actually initially stored in the storage array.
Data de-duplication may effective for certain application types such as email attachments, operating system or virtual desktop images and the like, and is usually less effective for data base information. Similarly, data compression may be effective for data base information, text files and the like, but may be relatively ineffective for encrypted or already compressed data. Selection of data deduplication or data compression may be done either heuristically or as selected by the user at some level of the storage architecture and the specific response of a storage system to such selections depends, for example, on the details of the processes performed by the storage system, on the data being processed, and on the temporal characteristics of the work load.