The present invention is directed to a data compression profiler for configuration of compression to be applied to a particular type of data set to achieve a desired compression ratio or distortion level.
Compression algorithms in common applications, such as audio and video compression, use compression parameters based on a priori knowledge of the data characteristics to produce compressed data having specified compression ratios. Examples include the widely used JPEG and MPEG for image/video compression and MP3 and WAV for audio compression. In particular, MPEG profiles set parameters for the video compressor that will provide desired output bit rates or compression ratios of the compressed video data.
In other applications, users may not have a full understanding of how to select appropriate parameters for a compression processor that will result in compressed data having an acceptable distortion level or compression ratio for the particular signal characteristics. Applications, referred to herein as high performance computing (HPC) applications, including supercomputing, high energy physics, climate modeling, weather forecasting, finite element analysis, thermal and fluid flow, and oil exploration data, generate immense data sets for a wide variety of signal types. As of 2012, datasets for such simulations typically contain hundreds of Gigabytes (10E9 Bytes) and for some applications may contain Petabytes (10E15) of data. Such large datasets cause immense bandwidth and capacity bottlenecks in computing systems, so compression of such datasets has significant economic value. Optimal configuration of the compression processing based on the signal characteristics can provide more efficient use of computing resources and data storage capacity. Therefore, there is a need for a systematic process for determining appropriate compression parameters for a given signal or data set for use with or selection of a compression algorithm.
Commonly owned patents and applications describe a variety of compression techniques applicable to fixed-point (integer) and floating-point representations of numerical data, signal samples or image samples. These include U.S. Pat. No. 5,839,100 (the '100 patent), entitled “Lossless and loss-limited Compression of Sampled Data Signals,” by Wegener, issued Nov. 17, 1998. The commonly owned U.S. Pat. No. 7,009,533, (the '533 patent) entitled “Adaptive Compression and Decompression of Bandlimited Signals,” by Wegener, issued Mar. 7, 2006, incorporated herein by reference, describes compression algorithms that are configurable based on the signal data characteristic and measurement of pertinent signal characteristics for compression. The commonly owned U.S. patent application Ser. No. 12/605,245 (the '245 application), entitled “Block Floating Point Compression of Signal Data,” by Wegener, publication number 2011-0099295, published Apr. 28, 2011, incorporated herein by reference, describes a block-floating point encoder and decoder for integer samples. The commonly owned U.S. patent application Ser. No. 13/534,330 (the '330 application), filed Jun. 27, 2012, entitled “Computationally Efficient Compression of Floating-Point Data,” by Wegener, incorporated herein by reference, describes algorithms for direct compression floating-point data by processing the exponent values and the mantissa values of the floating-point format. The commonly owned patent application Ser. No. 13/617,061 (the '061 application), filed Sep. 14, 2012, entitled “Conversion and Compression of Floating Point and Integer Data,” by Wegener, incorporated herein by reference, describes algorithms for converting floating-point data to integer data and compression of the integer data. The profiler described in the present specification may determine parameters for these compression algorithms for application to particular data sets.
The commonly owned patent application Ser. No. 12/891,312 (the '312 application), entitled “Enhanced Multi-processor Waveform Data Exchange Using Compression and Decompression,” by Wegener, publication number 2011-0078222, published Mar. 31, 2011, incorporated herein by reference, describes configurable compression and decompression for fixed-point or floating-point data types in computing systems having multi-core processors. In a multi-core processing environment, input, intermediate, and output waveform data are often exchanged among cores and between cores and memory devices. The '312 application describes a configurable compressor/decompressor at each core that can compress/decompress integer or floating-point waveform data. The '312 application describes configurable compression/decompression at the memory controller to compress/decompress integer or floating-point waveform data for transfer to/from off-chip memory in compressed packets. The profiler described in the present specification may determine parameters or select compression algorithms for the configurable compressor and decompressor of the '312 application.
The commonly owned patent application Ser. No. 13/617,205 (the '205 application), filed Sep. 14, 2012, entitled “Data Compression for Direct Memory Access Transfers,” by Wegener, incorporated herein by reference, describes providing compression for direct memory access (DMA) transfers of data and parameters for compression via a DMA descriptor. Parameters for compression provided to the DMA descriptor may be determined by profiler described herein. The commonly owned patent application Ser. No. 13/616,898 (the '898 application), filed Sep. 14, 2012, entitled “Processing System and Method Including Data Compression API,” by Wegener, incorporated herein by reference, describes an application programming interface (API), including operations and parameters for the operations, which provides for data compression and decompression in conjunction with processes for moving data between memory elements of a memory system. The profiler described herein may provide parameters for the compression operations of the API.