In Rosenblum et al, “The Design and Implementation of a Log Structured File System,” Proceedings of the 13th ACM on Operating System Principles, October 1991, a log structured file system was proposed where modified data blocks are re-written to the disk sequentially in a log-like structure. Information (metadata) is also written with each write operation about the data being written. This metadata is used in managing the system.
The concept of log-structured file systems and arrays (LSAs) is now well-known to those of ordinary skill in the art, and need not be further described here. For the purposes of the present description, the term “LSA” will be used throughout, for the sake of brevity, but it will be understood by one of ordinary skill in the art that the term may encompass any log-structured data storage arrangement, such as an array or a file system. An example of a conventional arrangement of storage apparatus incorporating an LSA is shown in FIG. 1, in which virtual address space 100 is mapped to storage under control of conventional LSA 102. Conventional LSA 102 makes use of three data areas: superblock area 104, track data area 106 and segment data area 108. The operation of these data areas will be explained further below.
One significant problem with the management of conventional LSA is the problem of taking backups of LSA snapshots.
With a conventional LSA, it is possible to make snapshot copies of extents of a customer-visible address space such that the same data is accessible through multiple extents of the address space but only one physical copy is stored. The ability to make these snapshots has numerous benefits described in the literature. One such benefit is the ability to make a snapshot of a running system at a particular point in time and store that to backup media as a consistent point in time record of the state of the system whilst allowing the system to run on whilst the backup is in progress. Another benefit of snapshot operations is the ability to make very many copies of some information and allow multiple clients access to one copy each but only store one physical copy of the data plus any changes the clients make to their own copies. An example of the latter use of snapshot might be a service provider providing remote storage for a stateless personal computing device: the service provider could provide each new client with a snapshot of a standard volume and the client could modify its own volume as needed. The service provider would need far less storage with this approach than if it were to allocate physical storage for each client volume up front.
The problem with conventional LSA in this area is that it is not possible to make use of both of those features and take a point in time backup of data which itself contains multiple snapshot extents. In fact, taking the point-in time backup works fine but when it comes to restoring the data the information about snapshots in the backup extent is not preserved and the restored data will no longer fit on the physical storage.
A second significant problem not addressed by conventional LSA techniques is that of scalability.
With a conventional LSA, there is a two-tier LSA directory which contains metadata that maps the virtual address space seen by the client application to the physical address space presented by the underlying physical storage. In order to preserve the LSA metadata across power outages it is necessary to allocate space for it in non-volatile storage somewhere. Typically it is stored on the underlying physical storage along with the customer data.
There are valid uses of the LSA snapshot feature which require an LSA implementation to present a vast address space to the customer and the problem with a conventional two-tier directory is that in order to present a vast address space, a significantly less but not negligible amount of physical storage space must be allocated for the directory. This imposes scalability constraints because, for example, the minimum amount of physical storage the customer must buy must be at least large enough to hold the meta data for the maximum virtual address space the customer might ever want. Whilst the factor between the size of the metadata and the virtual address space is quite large (say 1000×) this is offset by the number of times the data is snapshot so for example if data was on average snapshot 1000× then the minimum configuration would have to be at least half the size of the maximum configuration in order to hold all of the metadata for the maximum configuration.
One possible approach to this problem is to allocate the metadata dynamically, for example in a b-tree, and grow the amount of physical storage allocated to metadata as the customer's requirement for virtual address space increases. This approach is valid but it introduces significant extra implementation cost to the design of a fault tolerant LSA because of the complexity of manipulating the meta-meta-data structures.
A third significant problem not addressed by conventional LSA techniques is that of the performance of snapshot over very large address spaces.
It is important that snapshot operations happen as quickly as possible because, when they are used for taking point-in-time backups, whilst they are in progress the customer application is usually suspended and this backup-window usually represents lost business for the customer.
Conventional LSA implementations with a two tier directory do a reasonable job with snapshot but again have a scalability problem because the time taken to perform a snapshot is proportional to the size of the virtual extent being snapshot whether or not that extent has been written with data. This is because the directory must be scanned entry by entry to perform the snapshot whether or not the directory entries have ever been written.
It is reasonable to imagine a small storage service provider starting with, for example, a few terabytes of physical storage with the intention of scaling to a few exabytes using a virtual address space large enough for that amount of data and for a single snapshot of it and wishing to perform nightly backups of its clients' data by taking a large snapshot at midnight and spooling it off. With a conventional LSA, there would be a few problems with this naive approach, the most significant of which would be traversing all of the unused address space which would take a long time. Of course, these problems can all be addressed with the application of knowledge and experience by the storage service provider in selecting appropriate extents to back up but the requirement for an intelligent administrator translates into higher cost of ownership and possibly lower availability (because of the potential for human error).
A fourth significant problem not addressed by conventional LSA is the problem of LSA metadata writes reducing LSA scalability and performance.
When data is written to an LSA, the metadata must be updated to reflect the new location of the data. The conventional LSA approach is to write metadata changes to a journal in fast non-volatile memory and to harden the journal periodically to a copy of the metadata stored at a fixed location on the underlying physical storage.
This approach has the problem that, when the virtual address space is vastly larger than the working set and the working set is sparsely distributed in that virtual address space, hardening the journal results in a random disk write for each customer data write and therefore eliminates LSA's advantage of collating writes into segments in the first place. This limits the scalability and usability of conventional LSA to applications which do not exhibit this kind of workload.
It would thus be desirable to have an improved technology for managing data storage, and more particularly for managing a log-structured array (LSA) storage system.