The field of the present invention relates to data compression and analysis. Specifically, the invention relates to lossless compression of text and other information that can be represented by symbols.
Previous text compression techniques are usable on only one file or document at a time. Such techniques do not scale easily to enormous data sets, i.e., “Big Data,” or where data is spread across many different containers. Also, previous techniques do not track the number of times a particular symbol or string of symbols appears in the uncompressed text. The number of times a particular symbol or string appears is valuable information that is useful for improving space savings, reducing processing time, and conducting contextual analysis.