This invention relates to data storage devices. It finds particular application in a system which can store data in varying block sizes as may result from the use of data conversion or compression techniques.
Data compression techniques can provide two main benefits to data storage systems. The effective capacity of a data storage device can be increased beyond its physical capacity because the volume of data to be stored in the device is less than the logical volume of data transferred in and out of the storage system. The data transfer time in and out of the storage device is effectively reduced by this decrease in the physical volume of data. This is of particular benefit where operation of the storage device is relatively slow for example in a magnetic disk.
PCT Patent Application No. 91/20076 (Storage Technology Corporation) shows the use of dynamically-mapped virtual memory system to permit the storage of data so that each data record occupies only physical space required for the data. A compaction algorithm using a multipath storage director, compresses data prior to storage. Null fields are listed in the virtual memory map and not stored on the physical medium.
PCT Patent Application No. 91/ 20025 (Storage Technology Corporation) shows a dynamically-mapped virtual memory system in which deleted dataset space is immediately released for re-use. The data storage subsystem receives an indication that the data file is being scratched from the virtual VTOC. Additional data security is provided by preventing unauthorised access to the data of scratched storage files, both in cache and on the storage devices.
European Patent Application No. 0 682306 (IBM) shows a log-structured file system which compares the size of blocks of data with the space available for their storage to determine the most efficient positioning arrangement. It has separate buffers for compressed and uncompressed data and a controller which selectively writes the fewer number of bytes to the disc.
U.S. Pat. No. 4,467,421 (Storage Technology) shows a virtual storage system which is interposed between a host CPU and (disc drive) storage devices to permit the storage medium to mimic a tape drive and write the data in a contiguous manner. The system may be applied to a mixed-mode storage system the components of which have different response times. such a solid state RAM, CCD memory, disc drives and tape.
In many data storage devices such as magnetic disks optical disks and magnetic tapes data is stored in units of fixed size called data blocks. A common size of data block is 512 bytes. The address that a host gives to a block of data is called the logical block address, whereas the address of the memory area that actually stores the data block is called the physical block. The logical block addresses and the physical block addresses are normally in the same order (that is, consecutive logical block addresses usually correspond to successive physical block addresses) but the physical address space may not be continuous. The discontinuities may arise because of the physical characteristics of the storage medium, for example certain blocks in the physical address may be unusable because of the presence of defects. The logical address of a block of data may be translated to a physical address by using an algorithm or by using a lookup table to define discontinuities in the physical ordering of blocks.
However, when data compression is used the data size after compression may not be constant for each logical block because some data is more compressible than other data. Thus the amount of data resulting from compression of each logical block will be variable.
The amount of data that results from compression of a logical block depends on the nature of the data and the compression technique used. The physical size of a block of compressed data may also change when the data is read, modified and rewritten. There is therefore considerable difficulty in incorporating data compression within the immediate control structure of a random block access storage device such as a magnetic disk because of the problem of managing the variable amounts of data which result from compression of the fixed logical blocks of data.
The invention therefore provides a data storage system comprising: a store, a memory a user interface, and a memory controller, where the memory is used to buffer all data transferred between the user interface and the store, the system being characterised in that the memory controller copies data directly between the store and the memory, whereas the memory controller re-organises data when the data is transferred between the memory and the user interface.
The memory may have a capacity identical to a region of the store, where the region is a part of the store that can be conveniently accessed in a single operation in accordance with the accessing mechanisms of the store.
Data may be transferred to and from the store in units of the full capacity of the memory.
The data storage system may comprise several stores, each of which may perform data transfers to or from the memory.
The data storage system may comprise several memories, each of which may be loaded with data from an independent region of a store.
Data transfers from a store to a memory may be scheduled to provide the highest possible probability of a data block which is required for a transaction at the user interface being resident in a memory.
The store may be a magnetic disk.
The memory may be a random access semiconductor memory.
Data compression and decompression may be incorporated between the memory and the user interface.
A region of the store may contain a predetermined number of logically contiguous data blocks which are permanently resident in the region, and as many non-contiguous data blocks as may conveniently be accommodated in the available physical storage space of the region.
The non-contiguous data blocks may be relocated from a first region to a second region to create physical storage space if one or more of the logically contiguous data blocks which are permanently resident in the first region increases in size.
The non-contiguous data blocks may be relocated from a first region to a second region to fill physical storage space which results from a change in the physical size of one or more of the logically contiguous data blocks which are permanently resident in the second region.
Relocation of a non-contiguous data block between regions may be accomplished by transfer of the block from one memory to another memory.
A memory may be designated as the source or destination of all data blocks to be relocated between regions.
The memory controller may ensure that the memory is loaded with data from such a region of the store as will provide sufficient free physical memory space for relocation of a data block.
The logical page address of the predetermined number of logically contiguous data blocks which are permanently resident in a region may have a direct correspondence with the sequential address of the region within the store.
The logical address of an independent data block which is not permanently resident in a region may be translated to a logical address within a region by means of a lookup table.
The memory may be independently addressable in tiles which comprise a fixed number of data words.
Each logical block of data may be stored in a chain of linked groups of tiles.
The sizes and number of groups in each chain may be selected in accordance with the size of the block of data.
An address may be stored for each logical block of data identifying the physical address within the memory of the first group of tiles of the block.
Any unused groups of tiles in the memory may be linked together in a number of free space chains.
A battery may be provided to maintain a source of electrical power for sufficient time to allow transfer of all data from the memory to the store in the event of failure of a primary power supply.
Arrangements for management of the storage of data with variable block size. such as may result from data compression, have been devised. One of these arrangements treats a block of data as an indivisible unit and manages the fragmentation of free memory by means of relocation and reordering of blocks of data within the memory. This requires rewriting of the data blocks to different locations and the updating of a lookup table which maps logical to physical addresses.
One disadvantage of this arrangement is that repeated relocation of a data block can increase the chances of data being corrupted. The relocation is necessary. however, because a block of data has to be stored in a physically continuous area of physical memory. The arrangement requires fast access to the entire memory to perform the relocation of data and is most appropriate to large semiconductor memories such as solid state disks. The fast random data access operations which it requires are not compatible with magnetic disk memories.
Another arrangement for storing data with variable block size uses distributed storage of a block of data. The data block is subdivided into discrete segments which are stored at different locations in the physical memory. A memory is organised into a plurality of groups of tiles where a tile is a basic unit of memory and contains a fixed number of data words. There are a plurality of different group sizes and each group size contains a different number of tiles. When a data block is stored it is split up into a selection of the groups of tiles so as to minimise the wastage of storage space arising from partial use of a group of tiles. The discrete segments used for storage of a data block are linked together by link pointers stored in a group header associated with each segment or group. The physical location of the first segment used for storage of a block of data is stored separately, but preferably on the same medium, and can be used to compile a look up table of logical to physical block addresses. The arrangement can preformat the memory, into discrete segments having a plurality of predetermined sizes which can then be linked together to ensure that the minimum amount of storage space required for a block is used. To manage free data space, the segments not used for storage are linked via their group headers thereby giving a chain of segments of free memory. The arrangement requires multiple random accesses to the memory device for each block access and so is most effective when used with high speed semiconductor memory. It is particularly suitable for use in solid state disk memories. One disadvantage, however, is that it is not appropriate for use directly with memory devices such as magnetic disks which cannot provide fast random access.
The two arrangements described immediately above for management of the storage of data with variable block size rely on the principles of partitioning of a data block to locate it efficiently in a storage medium and relocation of a data block to compensate for any change in size of the stored data blocks. Both principles demand multiple random accesses to the storage medium for data block read and write operations and hence can only provide a high performance data storage system if fast memory is used for the storage device. The methods are intended primarily for solid state disk systems employing random access semiconductor memory. It is difficult to adopt these methods with a storage device such as magnetic disk because random access is on a magnetic disk is a mechanical operation and is relatively slow.