In the field of computer systems, a computer application will persist data that needs to be stored for future retrieval in a data storage system. The data that needs to be stored will be organized into files and databases. Files and databases are grouped into logical representations known as volumes of data. Volumes of data can contain one or more files or databases. The smallest logical unit of storage is a data block, which typically embodies up to a few thousand bytes (e.g., 4 k bytes) of data. A data block is the unit of data that is persisted to a storage system for future retrieval.
A storage system processes data blocks in groups known as volumes of data. A volume of data may also be referred to as a virtual disk, as when a storage system presents the volume of data to a computer application, the volume has the attributes and behavior of a disk device. A volume of data is a logical representation of a number of data blocks which are concatenated to form a larger set of data than can be stored as a group of data blocks. A storage system treats the volume of data as a single atomic unit. Data in a storage system may be stored unencoded such that the data block that is persisted is stored in unmodified form and can be retrieved without further decoding. Data is often stored by the storage system in an encoded form (e.g., compressed or encrypted) such that the data block which is to be persisted is first encoded prior to persisting, and is later unencoded (e.g., decompressed or decrypted) following future retrieval.
Data may be encoded for a plurality of benefits including but not limited to: including additional data to a data block to verify the validity of the data block; applying data reduction methods and algorithms to reduce the size of the data block (e.g., compression); and applying data cryptographic methods and algorithms to scramble the data block for security purposes (e.g., encryption).
Storage systems apply encoding on a group of data blocks, typically on a per volume basis. The storage system will then persist the definition of which encoding method or algorithm was utilized per volume so as to be aware of which method or algorithm to utilize when decoding data blocks from the same volume upon future retrieval. Applying encoding on a per volume bases has several limitations as different encoding types can only be applied to large groups of data. In addition, the user of the storage system typically has to choose which encoding type (e.g., data reduction algorithm or cryptographic algorithm) needs to be applied upon initial definition of the volume. This definition cannot be changed or amended after first application once data blocks have been persisted to the volume without retrospectively unencoding each data block in a volume and then re-encoding the data blocks to the new encoding definition.
Further, it may be desirable for different encoding to be variably applied to each individual data block instead of a whole volume of data blocks, based on a plurality of variable conditions including but not limited to: the content of the data; the ability of a data encoding algorithm to process a given data block; variable requirements from a computer application; and changing conditions within the storage system environment.
Thus, conventional storage systems that persist volumes of encoded data are limited in flexibility as it is typically not possible to apply different encoding mechanisms to different portions, or individual blocks of data, within the same volume of data.