In current business environment, all types of business data are becoming more and more critical to business success. The tremendous growth and complexity of business-generated data is driving the demand for information storage, defining the way of sharing, managing and protection of information assets.
Typically, no single technology or architecture is able to address all needs of any organization. Main storage technologies are described, for example, in the White Paper by EMC, “Leveraging Networked storage for your business”, March, 2003, USA and basically can be identified by location and connection type (intra-computer storage, direct attached storage (DAS), IP, channel networks, etc.) and by the method that data is accessed. There are three basic types of storage architectures to consider in connection with methods of data access: Block Access, File Access, and Object Access.
In block mode access architecture, the communication between a server/client and a storage medium occurs in terms of blocks; information is pulled block by block directly from the disk. The operation system keeps track of where each piece of information is on the disk, while the storage medium is usually not aware of the file system used to organize the data on the device. When something needs to get read or be written, the data are directly accessed from the disk by that processor which knows where each block of data is located on the disk and how to put them together. Examples of block mode access storage technologies are DAS (Direct Attached Storage), SAN (Storage Area Network), Block Storage over IP (e.g. FCIP, iFCP, iSCSI, etc.), intra-memory storage, etc.
File access requires the server or client to request a file by name, not by physical location. As a result, a storage medium (external storage device or storage unit within computer) is usually responsible to map files back to blocks of data for creating, maintaining and updating the file system, while the block access is handled “behind the scene”. The examples of file access storage technologies are NAS (Network Attached Storage with NFS, CIFS, HTTP, etc. protocols), MPFS (Multi-Pass File Serving), intra-computer file storage, etc. The file access storage may be implemented, for example, for general purpose files, web applications, engineering applications (e.g. CAD, CAM, software development, etc.), imaging and 3D data processing, multi-media streaming, etc.
Object access further simplifies data access by hiding all the details about block, file and storage topology from the application. The object access occurs over API integrated in content management application. The example of object access storage technology is CAS (Content Addressed Storage).
More efficient use of storage may be achieved by data compression before it is stored. Data compression techniques are used to reduce the amount of data to be stored or transmitted in order to reduce the storage capacity and transmission time respectively. Compression may be achieved by using different compression algorithms, for instance, a standard compression algorithm, such as that described by J. Ziv and A.. Lempel, “A Universal Algorithm For Sequential Data Compression,” IEEE Transactions on Information Theory, IT-23, pp. 337-343 (1997). It is important to perform compression transparently, meaning that the data can be used with no changes to existing applications. In either case, it is necessary to provide a corresponding decompression technique to enable the original data to be reconstructed and accessible to applications. When an update is made to a compressed data, it is generally not efficient to decompress and recompress the entire block or file, particularly when the update is to a relatively small part of data.
Various implementations of optimization of storage and access to the stored data are disclosed for example in the following patent publications:
U.S. Pat. No. 5,761,536 (Franaszek) discloses a system and method for storing variable length objects such that memory fragmentation is reduced, while avoiding the need for memory reorganization. A remainder of a variable length object may be assigned to share a fixed-size block of storage with a remainder from another variable length object (two such remainders which share a block are referred to as roommates) on a best fit or first fit basis. One remainder is stored at one end of the block, while the other remainder is stored at the other end of the block. The variable length objects which are to share a block of storage are selected from the same cohort. Thus, there is some association between the objects. This association may be that the objects are from the same page or are in some linear order spanning multiple pages, as examples. Information regarding the variable length objects of a cohort, such as whether an object has a roommate, is stored in memory.
U.S. Pat. No. 5,813,011 (Yoshida et al.) discloses a method and apparatus for storing compressed data, wherein compressed file consists of: a header that carries information showing the position of a compression management table; compressed codes; and the compression management table that holds information showing the storage location of the compressed code of each original record.
U.S. Pat. No. 5,813,017 (Morris et al.) discloses a method and means for reducing the storage requirement in the backup subsystem and further reducing the load on the transmission bandwidth where base files are maintained on the server in a segmented compressed format. When a file is modified on the client, the file is transmitted to the server and compared with the segmented compressed base version of the file utilizing a differencing function but without decompressing the entire base file. A delta file which is the difference between the compressed base file and the modified version of the file is created and stored on a storage medium which is part of the backup subsystem.
U.S. Pat. No. 6,092,071 (Bolan et al.) discloses a system for control of compression and decompression of data based upon system aging parameters, such that compressed data becomes a system managed resource with a distinct place in the system storage hierarchy. Processor registers are backed by cache, which is backed by main storage, which is backed by decompressed disk storage, which is backed by compressed disk storage then tape, and so forth. Data is moved from decompressed to compressed form and migrated through the storage hierarchy under system control according to a data life cycle based on system aging parameters or, optionally, on demand: data is initially created and stored; the data is compressed at a later time under system control; when the data is accessed, it is decompressed on demand by segment; at some later time, the data is again compressed under system control until next reference. Large data objects are segmented and compression is applied to more infrequently used data.
U.S. Pat. No. 6,115,787 (Obara et al.) discloses a disk storage system, wherein data to be stored in the cache memory is divided into plural data blocks, each having two cache blocks in association with track blocks to which the data belongs and are compressed, thus providing the storage of plural compressed records into a cache memory of a disk storage system in an easy-to-read manner. The respective data blocks after the compression are stored in one or plural cache blocks. Information for retrieving each cache block from an in-track address for the data block is stored as part of retrieval information for the cache memory. When the respective data blocks in a record are read, the cache block storing the compressed data block is determined based on the in-track address of the data block and the retrieval information.
U.S. Pat. No. 6,349,375 (Faulkner et al.) discloses a combination of data compression and decompression with a virtual memory system. A number of computer systems are discussed, including so-called embedded systems, in which data is stored in a storage device in a compressed format. In response to a request for data by a central processing unit (CPU), the virtual memory system will first determine if the requested data is present in the portion of main memory that is accessible to the CPU, which also happens to be where decompressed data is stored. If the requested data is not present in the decompressed portion of main memory, but rather is present in a compressed format in the storage device, the data will be transferred into the decompressed portion of main memory through a demand paging operation. During the demand paging operation, the compressed data will be decompressed. Likewise, if data is paged out of the decompressed portion of main memory, and that data must be saved, it can also be compressed before storage in the storage device for compressed data.
U.S. Pat. No. 6,532,121 (Rust et al.) discloses a compression system storing meta-data in the compressed record to allow better access and manage merging data. Markers are added to the compression stream to indicate various things. Each compressed record has a marker to indicate the start of the compressed data. These markers have sector number as well as the relocation block numbers embedded in their data. A second marker is used to indicate free space. When compressed data is stored on the disk drive, free space is reserved so that future compression of the same or modified, data has the ability to expand slightly without causing the data to be written to a different location. Also the compressed data can shrink and the remaining space can be filled in with this free space marker. A third type of marker is the format pattern marker. Compression algorithms generally compress the format pattern very tightly. However, the expectation is that the host will write useful data to the storage device. The compressor is led typical data in the region of the format pattern but a marker is set in front of this data to allow the format pattern to be returned rather than the typical data.
U.S. Pat. No. 6,584,520 (Cowart et al.) discloses a method of storage and retrieval of compressed files. The method involves dynamically generating file allocation table to retrieve compressed file directly from compact disk read only memory.
U.S. Pat. No. 6,678,828 (Pham et al.) discloses a secure network file access appliance supporting the secure access and transfer of data between the file system of a client computer system and a network data store. An agent provided on the client computer system and monitored by the secure network file access appliance ensures authentication of the client computer system with respect to file system requests issued to the network data store. The secure network file access appliance is provided in the network infrastructure between the client computer system and network data store to apply qualifying access policies and selectively pass through to file system requests. The secure network file access appliance maintains an encryption key store and associates encryption keys with corresponding file system files to encrypt and decrypt file data as transferred to and read from the network data store through the secure network file access appliance.
U.S. Patent Application No. 2004/030,813 (Benveniste et al.) discloses a method and system of storing information, includes storing main memory compressed information onto a memory compressed disk, where pages are stored and retrieved individually, without decompressing the main memory compressed information.
U.S. Patent Application No. 2005/021,657 (Negishi et al.) discloses a front-end server for temporarily holding an operation request for a NAS server, which is sent from a predetermined client, is interposed between the NAS server and clients on a network. This front-end server holds information concerning a correlation among data files stored in the NAS server, optimizes the operation request received from the client based on the information, and transmits the operation request to the NAS server.