1. Field of the Invention
This invention relates generally to storage server systems, for example, having a data cache where the data is maintained in compressed form, and particularly to an improved method for storing units of data in the data cache where the unit of storage is the size of a disk sector and headers and trailers having metadata and redundancy check information are employed.
2. Discussion of the Prior Art
Storage servers are computer systems functioning to manage securely and efficiently large amount of data stored on hard disks, optical disks, magnetic tapes or other forms of mass storage media.
FIG. 1 is a block diagram depicting the typical structure of a storage server device 100. As shown in FIG. 1, the storage server 100 is connected to the hosts via one or more host adapters 101, which performs the communication tasks required by the specific protocol of the selected interconnection network (e.g., Gigabit Ethernet, Token Ring or Fibre Channel). The host adapters 101 are connected to one or more processors or processor clusters 103 via a cluster interconnection network 102, which provides the media, protocols and services that ensure the communication between host adapters and processor clusters. To ensure continuous service in case of failure, usually a storage server has two or more processors or processor clusters 103, each operating across different power boundaries, so that lack of power at one cluster does not affect the rest of the system. Non Volatile Store (NVS) 107 may additionally be used to speed up write operations while maintaining high reliability. In medium-to-large size storage servers, processor clusters are used instead of individual processors. As known, processor clusters may be arranged, for instance, in a Symmetric Multi-Processor (SMP) configuration, where multiple processors share the same memory. The processor clusters provide all the functionality required to guarantee data integrity, including data recovery in case of failure. Each processor cluster is connected to one or more device adapters 104, which control the operations of hard disks, optical disks, magnetic tapes or other forms of mass storage media devices 105. Processor clusters may additionally share device adapters. The device adapters can provide additional data integrity functionality, such as RAID services.
The hosts served by a storage server are often heterogeneous. The data is transferred to and from the storage server in atomic units, which are usually pages containing 4 Kilobytes (KB) of data. Pages are usually divided into sectors, usually containing 512 Bytes of data, because a (disk) sector is the atomic unit of I/O supported by the disks. Some operating systems add headers or trailers to the sectors. For example as described in G. Soltis, xe2x80x9cInside the AS/400xe2x80x9d, Duke Press, Loveland CO, 1996, p.217, the operating system OS/400, which is the operating system of IBM AS/400, adds 8 Bytes of system header in front of each 512 Byte sector data, hence each sector contains 520 Bytes. To reduce the risk of data corruption, further headers or trailers may be added to the data within the storage server. For example, the host adapters can compute cyclic redundancy check bits and append them to each sector, further increasing the size of the sector, e.g., to 524 Bytes. The disk cache now may include simultaneously sectors of different size (but containing each 512 Bytes of sector data), depending on which type of host wrote them, which complicates its management. Alternatively, dummy headers and trailers may be appropriately added to sectors, so that all sectors now have the same size, and this approach wastes a small amount of space, but significantly simplifies the cache management.
While in the past 20 years the speed of processors has increased by a factor of 1000 or more, the speed of disks has barely increased by a factor of 3 to 4. Consequently, accessing data on disk is in general a very expensive operation in terms of latency. To hide part of the latency, in the storage server 100 of FIG. 1, a disk cache 106 may be employed, as taught, for instance, in the reference xe2x80x9cDisk cachexe2x80x94miss ratio analysis and design considerationsxe2x80x9d, ACM Trans. Comput. Syst. 3, (Aug. 1985), pp. 161-203. by A. J. Smith. A disk cache is a fast memory (for example DRAM) that contains a copy of part of the content of the data stored on disk. Usually, the most recently read part of the disk is stored in the disk cache, and prefetching algorithms can be used to load in the cache data with addresses close to those of the most recently read data. If a host request is for data contained in the cache (event called xe2x80x9ccache hitxe2x80x9d), the latency of the data transfer is equal to the time required to process the request plus the time to read the data from memory and transmit it. If the data is not in the cache (xe2x80x9ccache missxe2x80x9d), then the time to serve the host request is dominated by the disk access latency. A cache miss can have latency of three to four order of magnitude larger than a cache hit.
As described in the above-mentioned reference to A. J. Smith, the larger the cache, the smaller is the miss rate, and the better the performance of the overall system. However, the cost of RAM is about two order of magnitude larger than the cost of disk for the same capacity, and the gap appears to be growing. It seems therefore beneficial to increase the capacity of the cache by compressing its content.
Compressed caches are taught in the art, for example, as described in U.S. Pat. No. 5,875,454, to D. J. Craft, R. Greenberg entitled Compressed DataCache Storage System. FIG. 2 illustrates an example of a system architecture for a compressed cache 200, for a processor accessing data at high speed and in small memory block units, and a mass storage medium holding data in large transfer units. Uncompressed data is read from a mass storage system 201 in large transfer units (e.g., 64K to 200K bytes). The data so received is divided into 4K blocks which are individually compressed through the use of a Lempel-Ziv-type of lossless compressor 202. The compressed 4K blocks are stored in the cache 203, in an integer, variable number of allocation units 204, which are fixed-size sections of contiguous memory, having size, for example, 512 bytes. The compressed block need not be stored in contiguous allocation units.
The actual locations within the cache of the first allocation unit for a transfer unit is recorded in the directory 205. The other allocation units for the transfer unit are connected through a linked list. When a read request is received via the I/O interface 208, the data is read from allocation units where the required block is stored, decompressed by a fast decompressor 207, and sent to the computer via the I/O interface 208. All the operations are controlled by a compressed data cache controller 206, which also maintains the cache directory 205, and performs the usual caching functions (such as replacing policies, free space management etc.).
One downside of the scheme described in U.S. Pat. No. 5,875,454, is the cost of the special-purpose hardware that needs to be developed, the use of a moderate speed compressor, and the use of a linked list to connect the allocation units for each transfer unit. However, the described prior art scheme is useful for moderate size caches (for example several MB, as described in U.S. Pat. No. 5,875,454). However, in an enterprise-class storage server, it is desirable to have disk caches having capacity equal to 0.1% to 2% of that of the entire disk subsystem. Typical sizes of the disk subsystem are in the order of terabytes, and are growing, hence the desired disk cache size is in the order of the Gigabytes to hundreds of Gigabytes, and will grow in the future. Additionally, such servers are designed to serve a large number of hosts. Hence, the compressor speed becomes very important, data integrity is essential, the management of the cache becomes more complex, and new services must be provided by the storage server. It is possible to combine a general-purpose computer with one or more special purpose compressed caches. However, a more cost-effective solution would be to use just general purpose hardware. In particular, cost-performance considerations suggest to use part of the main memory of the processor clusters as disk cache: for example, Symmetric Multi-Processors (SMPs) are designed to support large memories of the desired size, and a fast interface between the processors and the memory exists. SMPs and other parallel computer architectures provide the computing power necessary to provide the services and functionalities required for an enterprise class storage server. Hence, the disk cache 106 (FIG. 1) may be part of the computer cluster memory, rather than a separate device. The processor cluster has in general enough computing power to provide the functionalities that a hardware disk cache controller provides. If the memory of the computer cluster contains the disk cache, it is desirable that the speed of the compressor 202 be high enough to guarantee compression at a rate at least equal to the memory bandwidth.
Simple methods for compressing the main memory of a general purpose computer are known in the art, and are taught, for example, in U.S. Pat. No. 5,812,817, to W. P. Hovis et al. entitled in Compression Architecture for System Memory Applications. FIG. 3 illustrates the typical approach for compressing the main memory of a general purpose computer. As shown in FIG. 3, the memory of a conventional computer 301 is partitioned into an uncompressed cache directory portion 302, an uncompressed cache portion 303, a setup table portion 304 and a compressed storage portion 305. The part of the physical memory to be extended by compression typically includes portions 303, 304 and 305. The data contained in the part of the physical memory to be extended by compression are compressed and stored in the compressed storage portion 305. The location of each compressed piece is stored in the setup table 304. When the processor accesses the main memory, first the cache directory 302 is accessed to find if the desired addresses are stored in the uncompressed cache 303. If the data were contained in the uncompressed cache, the uncompressed cache is accessed, otherwise the setup table 304 is accessed to find the location of the desired addresses in the compressed storage 304. The compressed data is decompressed, stored in the uncompressed cache 303, the uncompressed cache directory 302 is updated, and the uncompressed cache is accessed by the processor. It should be understood that, with this method, the disk cache would be treated exactly like any other data: it would be compressed, and parts of it would be decompressed and stored in the uncompressed cache, when they are accessed.
The method proposed in above-referenced U.S. Pat. No. 5,812,817, to W. P. Hovis et al. does not describe how compression and decompression are accomplished, how the directory is structured, how the setup table is structured, nor how the compressed storage is managed.
Particularly, a fast compression/decompression algorithm is required. One such fast compression/decompression algorithm that may be employed is described in U.S. Pat. No. 5,729,228 to P. Franaszek, et al. entitled PARALLEL COMPRESSION AND DECOMPRESSION USING A COOPERATIVE DICTIONARY. Furthermore, in U.S. Pat. No. 5,864,859 to P. A. Franaszek entitled SYSTEM AND METHOD OF COMPRESSION AND DECOMPRESSION USING STORE ADDRESSING there is described a technique for dividing each memory page into a plurality of memory lines, to compress the lines to form compressed pages comprised of the lines that are placed in a random-access storage. A directory to the compressed pages is provided, wherein a location for a directory entry for each page is in a translation table between page virtual addresses and directory entries, and the beginning of descriptors of where the kth line within each page is stored is located at a fixed offset from the location of the directory entry. A set of descriptors are provided for each line, which indicate the storage locations for the compressed line. The compressed portions of each line is stored in a set of fixed-size blocks, which are not placed in the directory descriptor space.
The following three figures, FIG. 4, FIG. 5 and FIG. 6, illustrate prior art, that can be found in U.S. Pat. No. 5,761,536 to P. A. Franaszek entitled SYSTEM AND METHOD FOR REDUCING MEMORY FRAGMENTATION BY ASSIGNING REMAINDERS TO SHARE MEMORY BLOCKS ON A BEST FIT BASIS, and in co-pending U.S. pat. appln. No. Ser. 08/603,976, entitled COMPRESSION STORE ADDRESSING, and in co-pending U.S. patent application. Ser. No. 09/229,057 entitled METHOD AND APPARATUS FOR ADDRESSING MAIN MEMORY CONTENTS INCLUDING A DIRECTORY-STRUCTURE IN A COMPUTER SYSTEM the whole contents and disclosures of each of which are incorporated by reference as if fully set forth herein.
FIG. 4 depicts the general structure of a computer system with compressed main memory. Memory accesses from the processor 401 are served by a hierarchy of processor cache memories 402. Caches will access the main memory 405 upon cache misses, on writes or on cache line replacements. Memory accesses are mediated by a compression controller 403 which compresses data sent from the processor cache to the main memory and decompresses the data sent from the main memory to the processor cache. The Input/Output subsystem 404 also interfaces with the main memory through the compression controller.
FIG. 5 illustrates in greater detail the structure of the processor cache hierarchy 402, components of the compression controller 403, and compressed main memory 405. As shown in FIG. 5, the compressed main memory 405 is implemented using a conventional RAM memory, which is used to store a directory 501 and a number of fixed-size memory blocks 502. The processor cache 402 is implemented conventionally using a cache directory 503 for a set of cache lines 504. The compression controller 403 includes a decompressor 504 which is used for reading compressed lines from the compressed main memory to the processor cache, and a compressor 505 which is used to write processor cache lines into the compressed main memory. The content of each processor cache line is associated with a given real memory address 506. Unlike a conventional memory, however, the address 506 does not refer to an address in the memory. Rather, the address 506 is used to index into the directory 501. Each directory entry contains information which allows the associated cache line to be retrieved. For example, the directory entry 507 for line 1 associated with address A1508 is for a line which has compressed to a degree in which the compressed line can be stored entirely within the directory entry. The directory entry 509 for line 2 associated with address A2510 is for a line which is stored in compressed format using a first full memory block 511 and a second partially filled memory block 512. The directory entry 513 for line 3 associated with addresses A3515 is for lines stored in compressed formats using a number of full memory blocks 517 and 518 and one shared memory block 519. The directory entry 514 for line 4 associated with addresses A4516 is for lines stored in compressed format using one shared memory block 519.
FIG. 6 illustrates an example directory entry format 601. For this example, it is assumed that the memory blocks 502 of FIG. 5 are of size 256 bytes, and that the cache lines 504 of FIG. 5 are of size 1024 bytes. This means that line can be stored in an uncompressed format using four memory blocks. For this example, directory entries of size 16 bytes are used, in which the first byte consists of a number of flags; the contents of the first byte 601 determine the format of the remainder of the directory entry. A flag bit 602 specifies whether the line is stored in compressed or uncompressed format. If stored in uncompressed format, the remainder of the directory entry is interpreted as for line 1606, in which four 30-bit addresses give the addresses in memory of the four blocks containing the line. If stored in compressed format, a flag bit 603 indicates whether the compressed line is stored entirely within the directory entry; if so, the format of the directory entry is as for line 3608, in which up to 120 bits of compressed data are stored. Otherwise, for compressed lines longer than 120 bits, the formats shown for line 1606 or line 2607 may be used. In the case of the line 1606 format, additional flag bits 604 specify the number of blocks used to store the compressed line, from one to four 30-bit addresses specify the locations of the blocks, and finally the size of the remainder, or fragment, of the compressed line stored in the last memory block (in units of 32 bytes), together with a bit indicating whether the fragment is stored at the beginning or at the end of the memory block, is given by four fragment information bits 605. Directory entry format 607 illustrates an alternative format in which part of the compressed line is stored in the directory entry (to reduce decompression latency); in this case, addresses to only the first and last blocks used to store the remaining part of the compressed line are stored in the directory entry, with intervening blocks (if any) found using a linked list technique, that is each blocked used to store the compressed line has, if required, a pointer field containing the address of the next memory block used to store the given compression line.
Main memory in paged memory systems is partitioned into pages, which have the same size (4K bytes=4096 bytes) as the pages transferred to and from disks. Most modern computer systems have paged memory systems, so it is reasonable to assume that the storage server will have a paged memory system. Compression and decompression then operates on pages or on lines, where the size of a page is an integer multiple of the size of a line. The size of a memory line can be equal to the size of a processor cache line or to a multiple of the size of a processor cache line. However, in the storage server, pages are composed of 8 sectors, each of which may contain more than 512 bytes, and therefore the cached disk pages are larger than the memory pages. This has several negative effects on the performance of the storage system if the main memory is compressed.
The first negative effect is an increase in memory traffic when a page is accessed. Consider for example the case where a memory line (the unit of compression/decompression) is 1024 Bytes long. A memory page includes four (4) lines. A disk page stored in the cache will span five (5) memory lines, and share at least one line with another disk page. In most cases, it will share two memory lines with other disk pages. There is an increase in traffic both during reads and during writes. During reads, five (5) memory lines must be decompressed to recover the disk page. During writes, the memory lines that the written disk page shares with other disk pages must first be decompressed, to reconstruct the original content, the disk page is then written in the appropriate locations, and 5 memory lines must then be compressed. Overall, this increases the response time of the system.
The second negative effect is on the compressibility of the data. The algorithms commonly used for compression rely on redundancy in the data, often at the byte level, to reduce its size. In particular, Lempel-Ziv-like algorithms look for repeated patterns in the data. Headers and trailers contain information that is unrelated to the patterns present in the data. They would also occur in any positions within a memory line, and a 1024 Byte long memory line will usually contain two headers and two trailers, and only occasionally only one header and one trailer. Not only do headers and trailers not compress well, but they also perturb the patterns of the data. Finally, for most memory pages, one of the four memory lines contains sectors belonging to two different disk pages, which results in an abrupt change of patterns at some point within the memory line, and can significantly decrease the compressibility of the data.
It would therefore be highly desirable to provide a system and method for efficiently implementing a disk cache on a computer system supporting main memory compression.
It would be further highly desirable to provide, in a computer system having a compressed disk cache memory and implementing a compressed memory directory for managing the compressed disk cache, a system for efficiently managing use and storage of headers and trailers through the compressed memory directory in a manner that avoids the negative effects resulting from a compressible disk cache system.
It would additionally be highly desirable to provide, in a computer system implementing a compressed disk cache memory, a system and method for detaching headers and trailers from sectors before storing the sectors in the disk cache, storing the headers and trailers, and, reattaching the headers and trailers to sectors when the sectors are sent from the disk cache to a host or to a mass storage device.
It is an object of the present invention to provide a system and method for efficiently implementing a disk cache on a computer system supporting main memory compression.
It is a further object of the present invention to provide, in a computer system having a compressed disk cache memory and implementing a compressed memory directory for managing the compressed disk cache, a system for efficiently managing use and storage of headers and trailers through the compressed memory directory in a manner that avoids the negative effects resulting from a compressible disk cache system.
Particularly, the present invention is directed to a storage server where the disk cache is compressed, and sectors can have headers and trailers, either attached by the hosts or by components of the storage server. The compressed disk cache comprises a compressed memory directory and a plurality of fixed-size blocks. The present invention teaches how to detach headers and trailers from sectors before storing the sectors in the disk cache, how to store the headers and trailers separately, and how to reattach headers and trailers to sector data when the sectors are sent from the disk cache to a host or to a mass storage device.
Particularly, the headers and trailers are managed through the compressed memory directory used to manage the compressed disk cache. Additional space is reserved in each entry of the compressed memory directory. This space is used to store the headers and trailers of the sectors corresponding to the entry. Alternatively, this space contains flags indicating the presence of headers and trailers, and pointers to memory devoted to contain headers and trailers. Particularly, an array of headers and trailers is contained in the compressed main memory of the storage server, and the pointers within the entries of the compressed disk cache point to entries of the array.
Alternatively, headers and trailers are stored within fixed-size blocks, and the additional space within the compressed memory directory entries contains the address of the blocks and the position of headers and trailers within the block.
Alternatively, the additional space is be reserved in an array separate from the compressed memory directory, but parallel to it, so that the same indexing scheme is used to address both parallel array and compressed memory directory.
Advantageously, the process of detaching headers and trailers from sectors for separate storage results in increased efficacy of data compression, thus yielding better compression ratios, and decreased memory traffic generated by host reads, host writes, cache stages and cache destages.