In the last decade, centrally hosted network filesystems with disconnected operation have grown to serve hundreds of millions of users. These services include SugarSync®, Dropbox®, Box®, Google Drive®, Microsoft OneDrive®, and Amazon Cloud Drive®.
Commercially, these systems typically offer users a maximum storage quota in exchange for a flat monthly fee, or no fee at all. Meanwhile, the cost to operate such a system increases with the amount of user data actually stored. These filesystems can rapidly become gargantuan. For example, one of the above-mentioned services currently stores roughly one Exabyte of user data. Therefore, operators of centrally hosted network filesystems benefit from techniques that reduce the net amount of user data stored.
Disclosed implementations can utilize various lossless data compression algorithms as a baseline, such as the Brotli compression algorithm. The Brotli compression algorithm is typically deployed to provide lossless data compression for static content such as Javascript, CSS, HTML, and other static web assets. In some implementations of the present invention, Brotli uses a pre-defined static dictionary derived from a corpus of text documents, such as HTML documents. Use of the dictionary can increase compression where a file repeats common words in the dictionary. Although Brotli provides good baseline lossless data compression for a wide-variety of content, there is a need for general-purpose lossless data compression techniques that can provide further compression savings. Operators of large-scale centrally hosted network filesystems that store large-amounts of user data (e.g., a few petabytes or more) would especially appreciate such techniques.