The present invention generally relates to managing a memory unit, and more specifically, to managing main buffers and overflow buffers in an overflow situation.
A computing system often uses a data conversion operation which changes the size of the data. For example, data compression techniques can be used to reduce the size of large data files and data decompression techniques used restore the original large data files. For example, a computing system may compress a file of several gigabytes (GBs) received from an external source. However, an application performing the compression may only have access to a buffer of a smaller size (e.g., one megabyte (MB)) so that the application can only read or write a small amount of the data in the large data file at the same time. In such cases, the application needs to divide the large file into blocks of arbitrary small sizes and compress each block individually rather than compressing the whole file at once. Size and alignment of the block is a characteristic of the available buffer space and normally is not consistent with a structure of the input, which may represent a single stream of many GBs in size.
At a machine-readable level, the content of a data file is a sequence of symbols. As used herein the term “symbol” refers to an undivided unit of a bit sequence and its counterpart used in the data conversion operation (e.g., compression/decompression operation). The meaning of “counterpart” will be explained later herein. In general, an individual conversion technique (e.g., compression/decompression technique) has a table that maps bit sequences to their meanings. For example, if the table defines “10100” as representing the letter “a,” then the bit sequence “10100” is a symbol since each of divided parts “10” and “100” is not used to represent the letter “a,” rather, that may be used to represent a different meaning. In this case, the length of the symbol is five bits. However, there are many kinds of symbols, and the lengths of the symbols vary depending on the kind of symbols. For example, in a compressed data file, a compression header that includes a Dynamic Huffman Tree (“DHT”) generated by a Huffman coding algorithm can be a symbol since if a part of the long sequence representing the DHT is lost, it is not possible to decode the compressed data that follows. The maximum size of this header is around 288 bytes.
In the case that an available space of the main buffer is shorter than the size of the next symbol (e.g., 280 bytes remains, but the next symbol is 288 bytes long), there are two options in conventional systems. One option is to fill the remaining space with a part of the symbol (e.g., the first 280 bytes) and discard the other remaining part of the symbol (e.g., the last 8 bytes). However, discarding the data may cause a serious error in the processing. A second option is to perform the conversion operation only with the previously stored symbols, without filling the remaining space of the main buffer. This can result in wasted memory space. In some cases, application buffer usage might require the available buffer space to be completely filled even if a symbol does not completely fit. In such environments, the computing system needs to manage an overflow situation.