Data compression in disk arrays has, in the prior art, required the use of complex file systems, e.g., log-structured systems. A log-structured file system is a technique for disk storage management wherein all modifications to a disk are written sequentially to a log-like file structure. The log-like file structure contains indexing information so that stored files can be read back from the log in an efficient manner. An aspect of such log-structured arrays is that large free areas are maintained on the disk in order to speed-up the write process. To maintain the large free areas, the log is divided into segments, and a segment "cleaner" is employed to compress information from heavily fragmented segments, thereby freeing up segments for subsequent writes.
One such problem with log-structured arrays is that they result in a substantial diffusion of data storage regions across the disk, such that sequential data records become widely distributed across separated physical storage locations. Thus, when sequential accessing of data records is performed, overall disk performance suffers due to the large number of movements that are required to position and reposition read/write heads to accomplish the sequential data accesses.
As above indicated, in a log-structured disk controller that supports data compression, writes to disk are not written in place, but instead, are written to new locations on the disk that were previously empty. The disk controller divides the disk into segments, some of which are kept "empty". New writes from the system are written into sectors located within empty segments. As a result, each write or update data causes the data to be written to new physical locations, and the physical locations of the data are subsequently collected and reused for future writes.
In U.S. Pat. No. 5,574,952 to Brady et al. (and assigned the same Assignee as this application), an improved method for control of log structured data storage includes the steps of: allocating a first amount of disk space for a compressed data unit as a first predetermined percentage of an uncompressed size of the data unit; and then increasing the allocation by a second predetermined percentage that is less than the first predetermined percentage to obtain a total amount of allocated disk space. The first predetermined percentage is a function of an expected compression ratio for the data unit and the second predetermined percentage is a function of an expected compression ratio for the data unit and an expected change in the size of the compressed data unit. The method further, during an update operation, compares the size of a compressed updated data unit to the total amount of allocated disk space and if the size of the compressed updated data unit is equal to or less than the total amount of allocated disk space, the compressed updated data is stored therein. Otherwise, other disk space is allocated for storage of the compressed data unit.
The Brady et al. procedure described above, reduces the physical dispersion problem of data, but does not insure that data which is recorded to disk in compressed form is logically sequential in the initially allocated physical sequence on the disk. The teachings and disclosure of U.S. Pat. No. 5,574,952 are incorporated herein by reference.
U.S. Pat. No. 5,537,588 to Engelmann et al., assigned to the same Assignee as this Application, also describes a log-structured file system for partitioning of disk space. The method disclosed by Engelmann et al. includes the steps of partitioning the disk data storage system into multiple partitions, including first and second partitions. The first partition is managed as a log-structured file system for storage of segments that are comprised of active data units, each having an access activity value that exceeds a predetermined threshold. Within the second partition, data units are stored that are less active and exhibit an access activity value that is less than the first predetermined threshold.
U.S. Pat. No. 5,237,460 to Miller et al. discloses a disk storage allocation procedure wherein a disk memory is partitioned to provide a first memory space that contains a large number of memory locations of a first size that are capable of storing a compressed version of a block of data. A further partition is provided which includes a second memory space containing a large number of memory locations of a fixed size that are capable of storing an uncompressed version of a data block. When data blocks are received, they are compressed and it is then detected whether or not each compressed block is as small as a first size to fit within the first memory space. Thereafter, the compressed data blocks that fit within the first space are stored therein and those which do not fit are stored in the second data space in uncompressed form.
It is known in the art to spread data across large arrays of small inexpensive disks. Such a system is described by Patterson et al. in "Redundant Array of Inexpensive Disks (RAID)", ACM Sigmod Conference, Chicago, Ill., Jun. 1-3, 1988, pages 109-116. In a RAID structure, various arrangements of data segregation are described across multiple disk drives. It is desirable that any memory allocation procedure which accommodates compressed data be compatible with RAID organizations.
Accordingly, it is an object of this invention to provide an improved memory storage allocation method for compressed data.
It is another object of this invention to provide an improved memory storage allocation method and apparatus which assures a reasonable likelihood that physically sequential, initially allocated disk space will accommodate compressed, updated data records.
It is yet another object of this invention to provide an improved method and apparatus for allocation of compressed data across disk surfaces which is compatible with RAID organizations.