Modern computing and storage systems manage increasingly larger volumes of data. For example, “big data” is often collected from a myriad of information sensing devices (e.g., mobile phones, online computers, RFID tags, sensors, etc.) and/or operational sources (e.g., point of sale systems, accounting systems, CRM systems, etc.). Many modern computing systems further include virtualized entities (VEs), such as virtual machines (VMs) or executable containers, to improve the utilization of computing resources. VMs can be characterized as software-based computing “machines” implemented in virtualization environments of the computing system that uses software to emulate the underlying hardware resources (e.g., CPU, memory, etc.). The executable containers implemented in container virtualization environments in computing systems comprise groups of processes and/or resources (e.g., memory, CPU, disk, etc.) that are isolated from the host computer and other executable containers. Some computing and storage systems might scale to several thousand or more autonomous VEs, each having a corresponding set of entity management data (e.g., entity metadata) and a set of workload data—all managed by the computing and storage system.
The resulting highly dynamic storage capacity and high I/O (input/output or IO) demands of the VEs has in turn brought to bear an increase in a need for high-performance distributed storage systems. Distributed storage systems can aggregate various physical storage facilities to create a logical storage pool where data may be efficiently distributed according to various metrics and/or objectives (e.g., resource usage balancing). In some cases, data compression becomes important to reduce the overall storage capacity demand of the computing and storage system.
Unfortunately, some legacy compression techniques merely compress the data stored in an entire disk (e.g., physical disk, virtual disk, etc.) or an entire file according to a certain batch schedule. For example, a 4 MB file might be compressed to 3 MB at some later moment in time (e.g., in a later-scheduled batch operation), however the entire 4 MB of storage capacity remains in use so long as the application is writing and/or modifying the file. What is needed is a technological solution for efficient data compression in highly dynamic computing and storage systems such that compressible but not yet compressed data is not stored for long periods of time.
What is needed is a technique or techniques to improve over legacy techniques and/or over other considered approaches. Some of the approaches described in this background section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.