Generally described, various computing systems exist in which one or more computing devices generate data to be analyzed. For example, a business system may include various computing devices that obtain/generate manufacturing and sales data that can be stored. The stored data can be analyzed for reporting and trend analysis. As the complexity of the computing systems and the data generated by the computing systems increase, a computing system administrator attempt to mitigate the strain on computing system resources, such as processor load and storage capacity, by incorporating some type of compression algorithm.
For many scenarios, the type of compression algorithm used by the computing system can be influenced by various characteristics of the data being collected. In one typical example, computing systems, such as business computing systems, can often collect data in a database that includes repetitive data entries. For example, in a sales information database, a computing system may generate millions of records corresponding to sales transactions in which multiple records would have the same date identifier, store identifier, register identifier, sales rep identifier, etc. Accordingly, in such scenarios, the computing system could incorporate a compression algorithm to reduce the amount of data required to store the repetitive data.
One conventional algorithm for compressing data in repetitive data embodiments is generally referred to as run length encoding. One skilled in the relevant art will appreciate that run length encoding of a sequential array of data generally relates to a determination of repeating data values in a sequence of data elements. The original data in the array can then be represented in a compressed array in which each data entry in the compressed array includes a data value element and the number of sequential data elements in the original array that share the common value. FIG. 1 is a block diagram illustrative of sequential data array 100 represented in a run length encoded array 150 in accordance with a conventional run length encoding algorithm. As illustrated in FIG. 1, a data array can include a series of array elements 102-124 that include multiple series of repeating values. In accordance with conventional run length encoding, the first three array elements 102-106 can be represented in array 150 at array element 152 by their value “A” and the number of sequential array elements having the value, e.g., “3”. Similarly, array elements 108-112 can be represented in array 150 at array element 154 by their value “B” and the number of sequential array elements having the value, e.g., “3”. With continued reference to FIG. 1, array element 156 corresponds to array elements 114 and 116 and array element 158 corresponds to array elements 118-124. Thus, in the illustrative embodiment, the 12 element array 100 can be represented by a four element compressed array 150.
Although various compression algorithms can minimize the amount of data that is stored by a computing system, conventional compression algorithms typically do not facilitate efficient searching of the compressed array. For example, in a run length encoding algorithm searching for array elements can be achieved by a linear search of the compressed array or a complete regeneration of the original array. Both searching scenarios are inefficient and place greater strain on processing resources. Additionally, the conventional compression algorithms typically do not allow the computing system to manipulate array element values and add/subtract array elements without requiring a regeneration of the original array. Accordingly, conventional compression algorithm approaches are deficient in requiring array element regeneration in the computing system to analyze and/or process data contained within a compressed data array.