The present invention relates to computer system architectures, and more particularly to a system and method for performing data compression and decompression using a plurality of parallel data compression/decompression engines in parallel for the reduction of system bandwidth and improved efficiency.
Computer system and memory subsystem architectures have remained relatively unchanged for many years. While memory density has increased and the cost per storage bit has decreased over time, there has not been a significant improvement to the effective operation of the memory subsystem or the software which manages the memory subsystem. The majority of computing systems presently use a software implemented memory management unit which performs virtual memory functions. In a virtual memory system, the non-volatile memory (e.g., hard disk) is used as a secondary memory to provide the appearance of a greater amount of system memory. In a virtual memory system, as system memory becomes full, least recently used (LRU) pages are swapped to the hard disk. These pages can be swapped back to the system memory when needed.
Software-implemented compression and decompression technologies have also been used to reduce the size of data stored on the disk subsystem or in the system memory data. Current compressed data storage implementations use the system""s CPU executing a software program to compress information for storage on disk. However, a software solution typically uses too many CPU compute cycles and/or adds too much bus traffic to operate both compression and decompression in the present application(s). This compute cycle problem increases as applications increase in size and complexity. In addition, there has been no general-purpose use of compression and decompression for in-memory system data. Prior art systems have been specific to certain data types. Thus, software compression has been used, but this technique limits CPU performance and has restricted use to certain data types.
Similar problems exist for programs that require multiple applications of software threads to operate in parallel. Software compression does not address heavy loaded or multi-threaded applications, which require high CPU throughput. Other hardware compression solutions have not focused on xe2x80x9cin-memoryxe2x80x9d data (data which reside in the active portion of the memory and software hierarchy). These solutions have typically been I/O data compression devices located away from the system memory or memory subsystem. In general, the usage of hardware compression has been restricted to slow input and output devices usually located at the I/O subsystem, such as the hard drive.
Mainframe computers have used data compression for acceleration and reduction of storage space for years. These systems require high dollar compression modules located away from the system memory and do not compress in-memory data in the same memory subsystem for improved performance. Such high dollar compression subsystems use multiple separate engines running in parallel to achieve compression speeds at super computer rates. Multiple separate, serial compression and decompression engines running in parallel are cost prohibitive for general use servers, workstations, desktops, or mobile units.
Lower cost semiconductor devices have been developed that use compression hardware. However, these devices do not operate fast enough to run at memory speed and thus lack the necessary performance for in-memory data. Such compression hardware devices are limited to serial operation at compression rates that work for slow I/O devices such as tape backup units. The problem with such I/O compression devices, other than tape backup units, is that portions of the data to compress are often too small of a block size to effectively see the benefits of compression. This is especially true in disk and network subsystems. To operate hardware compression on in-memory data at memory bus speeds requires over an order of magnitude more speed than present day state-of-the-art compression hardware.
The amount of system memory available for executing processes within Prior Art computer systems is generally limited by the amount of physical memory installed in the system. It is desirable to provide a method of increasing the effective size of system memory without increasing actual physical memory, and to thus allow processors and/or I/O masters of the system to address more system memory than physically exists. It is also desirable that this method be applicable to other aspects of computer operation such as the transmission and reception of data via a network.
Embodiments of a compression/decompression (codec) system may include a plurality of parallel data compression and/or parallel data decompression engines. A codec system may be designed for the reduction of data bandwidth and storage requirements and for compressing/decompressing data. The plurality of compression/decompression engines may each implement a parallel lossless data compression/decompression algorithm. Embodiments may include various combinations of compression engines, decompression engines, and compression/decompression engines. In one embodiment, there may be N compression engines and N decompression engines. Note, however, that it is not required that the number of compression engines be symmetrical with the number of decompression engines. For example, in one embodiment, a codec system may implement a plurality of compression engines and only one decompression engine.
A codec system for performing parallel data compression and/or decompression may also include logic for splitting incoming uncompressed or compressed data up among the plurality of compression/decompression engines. The codec system may also include logic for merging the portions of compressed or uncompressed data output from the plurality of compression/decompression engines.
The following describes one embodiment of a method of splitting uncompressed data among a plurality of parallel compression engines to produce compressed data. A codec system may include a plurality of compression engines that each implements a parallel lossless data compression algorithm. Uncompressed data may be received and split into a plurality of portions. Assuming there are N parallel compression engines available, the uncompressed data may be split into N portions. Each of the portions of the uncompressed data may then be provided to a different one of the N parallel compression engines. Each of the N parallel compression engines then compresses its portion of the uncompressed data. The N compression engines thus compress the N portions of uncompressed data to produce N portions of compressed data. The N portions of compressed data are then merged to produce the compressed data.
The following describes one embodiment of a method of splitting compressed data among a plurality of parallel decompression engines to produce uncompressed data. A codec system may include a plurality of decompression engines that each implements a parallel lossless data decompression algorithm. Compressed data may be received and split into a plurality of portions. Assuming there are M parallel decompression engines available, the compressed data may be split into M portions. Each of the portions of the compressed data may then be provided to a different one of the M parallel decompression engines. Each of the M parallel decompression engines then decompresses its portion of the compressed data. The M decompression engines thus compress the M portions of compressed data to produce M portions of uncompressed data. The M portions of uncompressed data are then merged to produce the uncompressed data.
A codec system as described herein may be included in any of various devices, including a memory controller; memory modules; a processor or CPU; peripheral devices, such as a network interface card, modem, IDSN terminal adapter, ATM adapter, etc.; and network devices, such as routers, hubs, switches, bridges, etc., among others. Where the codec system is included in a device, data transfers to and from the device can thus be in either of two formats, these being compressed or normal (non-compressed). Thus compressed data from system I/O peripherals such as the non-volatile memory, floppy drive, or local area network (LAN) may be decompressed in the device and stored into memory or saved in the memory in compressed format. Thus, data can be saved in either a normal or compressed format, retrieved from the memory for CPU usage in a normal or compressed format, or transmitted and stored on a medium in a normal or compressed format.
Embodiments of the parallel compression/decompression engines as described herein may implement an improved system and method for performing parallel data compression and/or decompression designed to process stream data at more than a single byte or symbol (character) at one time. In one embodiment, the system and method may use a lossless data compression and decompression scheme. The parallel compression and decompression methods may examine a plurality of symbols in parallel, thus providing greatly increased compression and decompression performance. These parallel compression and decompression engines may be referred to herein as parallel codec engines. In one embodiment, a parallel compression and decompression engine may implement a modified single stream dictionary based (or history table based) data compression and decompression method, such as that described by Lempel and Ziv, to provide a scalable, high bandwidth compression and decompression operation.
The integrated data compression and decompression capabilities of the codec system removes system bottlenecks and increases performance. This allows lower cost systems due to smaller data storage requirements and reduced bandwidth requirements. This also increases system bandwidth and hence increases system performance. Thus the present invention provides a significant advance over the operation of current devices, such as memory controllers, memory modules, processors, and network devices, among others.