As the field of computing rapidly expands, the ability to compress and decompress large amounts of data for transfer and storage becomes a non-trivial necessity. Many applications require the use of extremely large sets of data, which often result in slow access and processing speeds. Increasing parallelism in computing (e.g., multi-threading) has lead to a dramatic increase in performance of existing applications by allowing an application to run concurrent threads simultaneously.
Given a large file containing rows of variable or fixed-width length data, current compression techniques do not store information regarding chunk offsets of individual pieces of data. Current techniques in the art compress a chunk of bytes, rather than a chunk of rows. This technique of compressing bytes ignores the structure of data stored in rows and thus ignores the regularities that a row-based data file enjoys. For example, a row-based data file may contain a column of data containing a key value. A large data file may be partitioned and compressed according to the key value. The absence of missing offset data requires a decompression utility to decompress an entire file before accessing data, as the missing offsets eliminate the possibility for random-access to the compressed file. Decompression is a serial operation and serial operations do not exploit the benefits of a multi-threaded or multi-process environment as they by nature stall the operation of threads or processes requiring access to the uncompressed data. For smaller compressed files the single-threaded approach to decompression does not present a considerable problem, but as larger and larger data files are utilized by modern applications, the delay caused by a single-threaded approach constitutes a significant performance problem. Therefore, there is a need in the art for a technique to allow for parallel access to data stored within a compressed file.