1. Technical Field
Present invention embodiments relate to compressing floating-point data, and more specifically, to compressing binary floating-point data based on a previous loss of precision incurred during capture or processing of that floating-point data.
2. Discussion of the Related Art
Storing large amounts of data for live data retrieval can be expensive due to the amount of storage and processing hardware required and maintained, and the electrical power required to operate that hardware (e.g., in datacenter operations). To reduce storage, bandwidth, and computational requirements, data compression techniques are employed. Data is compressed prior to storage and decompressed when an application or user retrieves the data. Data compression techniques may be lossless or lossy. Lossful techniques may be used when it is acceptable to lose some data precision relative to the original source data. For example, a photo presented by way of a low-resolution system (e.g., a cell phone display) may not need all of the detail provided by the higher resolution source photo (e.g., a high-resolution family portrait). The cell phone essentially displays the substance of the higher resolution family portrait, but with the reduced processing and storage requirements available by using the lower resolution image (i.e., data that are imperceptible to the viewer may be discarded).
Lossless techniques for compressing data may be used when a loss of data would be detrimental (e.g., a loss of a bank account digit, a loss of social security number, or loss of data for mission critical systems such as emergency response systems). In other examples, systems that use floating-point data such as results of medical or seismic studies, those systems may not have knowledge of how the resulting data may be used, or the kind of internal structure or relationships between values that may exist, e.g., variable correlation. In such systems, it may not be known what information is relevant, and what information is irrelevant, and therefore, may be discarded as an acceptable lossy result.
Depending on system requirements, compression of all data fields may be attained down to their intrinsic entropy. For integer data, a significant amount of compression can be found by eliding high-precision bits in small-magnitude values, or in the value differences among values. In character data, useful compression may be achieved by eliding trailing spaces or by using predictive coding. However, for floating-point data types, very little compression can be achieved when using integer or character data compression techniques, and predictive coding depends on information, which is not known to the system, namely the relationships between individual values.