1. Field of the Invention
The present invention relates to a computer program product, system, and method for managing data in storage according to a log structure.
2. Description of the Related Art
In a log structured storage system, updates to data are not written in place, but written to a selected logical and physical address. Data may be stored in a key-value store, where data is stored in unstructured records consisting of a key plus the values associated with that record. An index having the keys and log addresses for records in the log may be used to look up the log address in the log for the record. Each index entry has the indexed key that is unique in a namespace or set of data and an address of the data in a log.
In the log structured storage, updates are written to an end of the log, and previous versions of the record in the log remain. When an entry at a log address in the log no longer has a corresponding index entry identifying the log address, then the entry in the log is deemed unused and can be garbage collected. However, since the granularity of garbage collection (“region/slot”) is much larger (coarser) than the granularity of data (key-value record), in order to garbage collect an entire slot, some potentially valid entries in the slot may have to be relocated. To look up a relocated entry where the index is still pointing to an old location, the storage layer may maintain an indirection table mapping the old entry to the new entry. The indirection table is used to redirect a request for a record to the old address to the new address when an index is not available to provide the current address for a record. Indirection results in longer lookup times and a performance penalty to map an old address to the new address, which then must be mapped to the physical address to access the record.
Another technique for performing garbage collection is a compaction process that reads and re-writes all data to a reclaim space. This requires that both live and old data be moved to free-up space to make available for subsequently received data. Compaction is Input/Output intensive because both live and deleted data needs to be read and rewritten to the new space.
In embodiments where the storage layer places a tombstone record in the log indicating an outdated record, garbage collection must scan the log to process the tombstone records to determine the records to delete.
There is a need in the art for improved techniques for performing garbage collection of a storage system and in a log structured storage system in particular.