The amount of data that is transferred via the Internet is staggering. Recent estimates project that there will soon be more than one trillion web pages, and that more than half of the world's population has access to the Internet. At the same time, the capacity of commodity storage devices continues to increase while maintaining or even reducing their cost. For example, hard disc drives of 2 TB or even 3 TB can be purchased under $100. Meanwhile, because of the cheap cost of storage, there is often little motivation to implement space-efficient storage schemes for most types of data.
This creates a dichotomy. On the one hand, you have an ever increasing number of users with ever increasing access to storage. On the other hand, the increase in the number of users and their increased appetites for downloaded content results in bandwidth capacities that are constantly being pushed to their limits, leading Internet Service Providers to propose implementing tiered services, while users argue for “Net Neutrality.” This problem is even more exacerbated for wireless access, as evidenced by more mobile carriers removing their limitless data plans in favor of tiered data plans.
Data content is transferred over the Internet in one of several packet-based transmission schemes, such as TCP/IP and UDP. This technique is commonly referred to as “streaming,” but in reality the process involves partitioning a stream of bits (i.e., bitstream) comprising or otherwise derived from an original document, into a multitude of packets that are individually sent across the Internet and assembled and processed at the receiver to extract the same stream of bits. Furthermore, the processing may involve compression operations at the sender and decompression operations at the receiver. Upon completion of the processing, a copy of the original document is obtained.
In order to squeeze the most out of available transfer bandwidth, data content is often streamed in a compressed form. Some types of content are commonly stored in compressed formats based on well-established standards, such as music and video content. Other content, including HTML and general document content are generally not stored in compressed form. For example, the more recent versions of Microsoft Office products store document content in an XML-based format.
One technique for enhancing bandwidth is to perform on-the-fly compression at a sending entity and corresponding decompression at a receiving entity. Similar techniques may be used for real-time or batch archival purposes. In the case of document compression or archival, a “lossless” compression algorithm is typically used such that no data is lost when the compressed document content is decompressed. There are various lossless compression techniques employed for such purposes, including entropy encoding schemes such as Huffman coding, run-length encoding, and dictionary coders such as Lempel-Ziv (e.g., LZ77) and Lempel-Ziv-Welch (LZW).
One commonly used compression/decompression scheme is called DEFLATE, which is a variation on LZ that is uses a combination of the LZ77 algorithm and Huffman coding and is optimized for decompression speed and compression ratio, but involves computationally high compression costs. DEFLATE is used by popular compression tools such as PKZIP, which archives data in the ZIP format. DEFLATE is also used by GZIP compressed files and for PNG (Portable Network Graphics) images. In accordance with the use of HTTP Compression defined by RFC 2616, a web server may send and respond to HTTP content that is compressed with GZIP compression and decompressed using DEFLATE.
In view of the foregoing, it is projected that the use of lossless compression techniques in combination with content streaming will become ever more prevalent. Accordingly, it would be advantageous to provide enhanced generation and processing of bitstream content.