This disclosure relates to data systems that store files with unpopulated or unallocated portions. In particular, it relates to access to files within a system that provides content addressable storage.
Content addressable storage (CAS) allows for data to be stored using identifiers that are generated from the content of the data. This allows for the data to be retrieved using these identifiers without knowledge of a physical address within the storage device. For instance, when a file (or data object) is stored in a CAS system, the CAS system can generate a signature that uniquely identifies the file content, at least in a statistical sense. The CAS system can also specify the storage location for each identifier. This type of address is sometimes referred to as a “content address.”
Two or more data blocks that have identical data content (whether the data blocks are duplicates of one another, or incidentally contain the same data) will result in the same signature being generated for the files. Retrieval of the data content for any of these files will use this common signature. Thus, a single location can store the data for multiple data objects and CAS system can reduce the storage space consumed by files, and particularly for data backups and archives. CAS systems also facilitate authentication of files. For instance, due to there being only one copy of a file, verifying legitimacy of the file can be simplified.