Businesses are growing increasingly dependent on distributed computing environments and wide area computer networks to accomplish critical tasks. Indeed, a wide variety of business applications are deployed across intranet, extranet and Internet connections to effect essential communications with workers, business partners and customers. As the number of users, applications and external traffic increases, however, network congestion forms, impairing business application performance. Enterprise network managers, therefore, are constantly challenged with determining the volume, origin and nature of network traffic to align network resources with business priorities and applications.
Data compression, caching and other technologies that optimize or reduce the size of network traffic flows can be deployed to improve the efficiency and performance of a computer network, and ease congestion at bottleneck links. For example, implementing data compression and/or caching technology can improve network performance by reducing the amount of bandwidth required to transmit a given block of data between two network devices along a communications path. Data compression technologies can be implemented on routing nodes (or other network devices in a communications path) without alteration of client or server end systems, or software applications executed therein, to reduce bandwidth requirements along particularly congested portions of a communications path. For example, tunnel technologies, like those used in Virtual Private Network (VPN) implementations, establish tunnels through which network traffic is transformed upon entering at a first network device in a communications path and restored to substantially the same state upon leaving a second network device.
A variety of compression algorithms and technologies have been developed, such as run-length encoding (RLE), Huffman encoding, Lempel-ziv compression (e.g., LZ77, LZ78, etc.), Lempel-Ziv-Welch (LZW) compression, fixed library compression, and combinations/variants of the foregoing compression methods. All compression methods have their own advantages and tradeoffs. It is generally understood that no single compression method is superior for all applications and data types. The most beneficial choice of compression tools and libraries for a particular network application depends on the characteristics of the data and application in question: streaming versus file; expected patterns and regularities in the data; relative importance of CPU usage, memory usage, channel demands and storage requirements; and other factors.
In the realm of data compression, there frequently exists a data structure known as a codebook. A codebook is a mapping between an incoming symbol (or bit pattern) and an outgoing symbol or bit pattern. This outbound representation generally consists of a bit pattern having a given length. By assigning shorter length bit patterns to the more frequently-encountered incoming bit patterns, data compression aims to reduce the amount of overall data requiring storage or transmission. The act of analyzing incoming symbol probabilities and determining the optimal outgoing bit patterns is called codebook generation, which can often be a computationally intensive operation.
Data compression mechanisms may use static or dynamic codebooks. For example, to adapt to changing data characteristics, dynamic compression systems regenerate or update the codebook used to compress a data stream in response to the history of input bit patterns or symbols. Due to the CPU-intensive nature of codebook regeneration, however, the effective throughput of a data compression system can be reduced by the amount of time spent maintaining the codebook. For example, Adaptive Huffman compression can update the codebook after every input symbol, after ‘N’ input symbols, or after processing a given block of input data. In the first case, this compression scheme adapts to changing input data very effectively, but at a higher CPU cost. In the second case, selection of larger ‘N’ values reduces the use of CPU resources for codebook regeneration, but sacrifices the adaptability of the compression algorithm, possibly resulting in poor compression performance. The third case is similar to the second case where longer blocks of data result in poorer adaptability, potentially reduced compression performance, but better CPU usage. On the other end of the spectrum, Static Huffman compression relies on a single codebook, either selected from a knowledgebase of codebooks, or computed once for an incoming data block. As a result, CPU usage for codebook generation is significantly reduced at the expense, however, of adaptability and possibly compression performance.
Prior art compression mechanisms, however, do not balance the effect on throughput of codebook regeneration against the expected gain in compression performance and data throughput. Indeed, an update to a codebook may not yield a significant effect in the compression ratio, and/or ultimately the effective data throughput, achieved by a compression system. Accordingly, CPU resources (and thus throughput during codebook regeneration) may be expected with little to no resulting gain in data throughput or performance. In light of the foregoing, a need in the art exists for methods, apparatuses and systems directed to controlling codebook updates to reduce the effect on data throughput of unnecessary codebook regeneration.