The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for efficient adaptive read-ahead in log structured storage.
A log structured storage system (LSSS) or log structured file system (LFS) is a form of disk storage management to improve disk access time. Log structured storage systems (LSSSs) use the assumption that files are cached in a main memory and that increasing memory sizes will make the caches more effective at responding to read requests. As a result, disk use is dominated by writes. An LSSS writes all new information to disk in a sequential structure called a log. New information is stored at the end of the log rather than updated in place, to reduce disk seek activity. As information is updated, portions of data records at intermediate locations of the log become outdated. This approach increases write performance by eliminating almost all seeks. The sequential nature of the log also permits faster crash recovery.
In an LSSS, data is stored permanently in the log and there is no other structure on disk. For an LSSS to operate efficiently, it must ensure that there are always large extents of free space available for writing new data.
Log structured disks (LSD) and log structured arrays (LSA) are disk architectures that use the same approach as the LSSS. LSAs combine the LSSS architecture and disk array architecture with a parity technique to improve reliability and availability. Generally, an LSA includes an array of physical discs and a program that manages information storage to write updated data into new disk locations rather than writing new data in place. Therefore, the LSA keeps a directory which it uses to locate data items in the array.
As an illustration of the N+1 physical disks of the LSA array, an LSA system may include a group of direct access storage devices (DASDs), each of which includes multiple disk platters stacked into a column. Each disk is divided into large consecutive areas called segment-columns. A segment-column is typically as large as a physical cylinder on a physical disk. Corresponding segment-columns from the N+1 disks constitute a segment. The array has as many segments as there are segment-columns on a disk in the array.
A logical track is stored entirely within some segment-column of some physical disk of the array; many logical tracks can be stored in the same segment-column. The location of a logical track in an LSA changes over time. A directory, called the LSA directory, indicates the current location of each logical track. The size of a logical track is such that many logical tracks can be stored in the same LSA segment-column.
In LSAs and LSSSs, data to be written is grouped together into relatively large blocks which are written out as a unit in a convenient free block location on disk. When data is written, the previous disk locations of the data become free creating unused data (or garbage) in the blocks on disk. Eventually the disk fills up with blocks and it may be necessary to create free block locations by reading source blocks containing at least some unused data and compacting their still-in-use content into a lesser number of destination blocks without any unused data. This process is called free space (or garbage) collection.
To ensure that there is always an empty block to write to, all logical tracks from a block selected for free space collection that are still in that block (i.e., are still pointed to by the LSA directory) are typically read from disk and placed in a memory block. These logical tracks will be written back to disk when the memory block fills. Free space collected blocks are returned to the empty block pool and are available when needed.