This application relates generally to data compression and decompression and more particularly to data compression and decompression of firmware used in a disc drive.
There has been a continuous move in the disc drive industry generally to reduce the size and costs of disc drive systems, including both high- and low-end systems. One area where significant cost and size reductions may be made is the area of non-volatile memory components, which are often the most expensive components of disc drive systems. Non-volatile memory components, such as ROM or flash RAM, etc., are typically incorporated in disc drives for the storage of the operating code of the disc drive. Upon the start of the drive, portions of the disc drive operating code that must be accessed quickly, are typically transferred from the ROM or flash RAM to volatile memory (such as standard RAM) in the disc drive. These portions of the operating code may then be accessed quickly for execution by the disc drive microprocessor. The remaining portions of the disc drive operating code are left in the non-volatile memory from which they are accessed and executed.
Numerous approaches have been proposed and implemented which are directed to reducing the use of non-volatile memory in the disc drive, thereby reducing the cost of the disc drive. One such approach involves compressing the operating code of the disc drive and storing the compressed operating code in the non-volatile memory of the disc drive. Upon start up of the drive, a small bootstrap program is used to decompress the compressed operating code and to store the decompressed operating code in volatile memory of the disc drive. The disc drive operating code is then accessed from the volatile memory and executed by the disc drive microprocessor. By compressing the disc drive""s firmware a significant reduction in the size of the non-volatile memory used to store the firmware may be achieved, with a corresponding reduction in the cost of the non-volatile memory and the disc drive as a whole. One such approach utilizes a single-mode, Huffman-type compression technique in a 2-processor decompression method where one processor performs the decompression of the operating code and loads it into RAM for use by the second processor.
In general, data compression converts data defined in a given format to another format so that the resulting compressed format contains fewer data bits (i.e., the ones and zeros that define digital data) than the original format. Hence, the data is compressed into a smaller representation. When the original data is needed, the compressed data is then decompressed using an algorithm that is complementary to the compression algorithm.
There are two principal types of data compression/decompression schemes (herein after referred to simply as compression schemes), lossless compression schemes and lossy compression schemes. Lossy compression refers to schemes in which the decompressed data is not exactly the same as the original data. In lossy type compression scheme, certain elements of the data are intentionally omitted or lost during the processes of compressing and decompressing the data. Lossy compression schemes are typically used in the compression of images or sounds where the loss of limited or redundant data in the decompressed data is typically unnoticeable and, therefore, acceptable. Lossy compression schemes are not, however, suitable for compressing executable files, such as operating code files, where the loss of even a single bit of information may render the file useless. In applications where the loss of data is unacceptable, lossless compression schemes are preferable if not required.
In the field of lossless data compression there are two general types of compression techniques: (1) dictionary based (or sliding-window) compression; and (2) statistical compression. Dictionary-based compression schemes examine the input data stream and look for groups of symbols or characters that appear in a dictionary that is built using data that has already been compressed. If a match is found, a single pointer or index into the dictionary is output into the compressed data stream instead of the group of symbols. In this way, a commonly-occurring group of symbols can be replaced by a smaller index value. The principal difference between the numerous dictionary-based schemes is how the dictionary is built and maintained, and how matches are found.
One well-known dictionary based scheme is the LZ77 algorithm. The basic LZ77 scheme is described in Ziv, J. and Lempel, A., xe2x80x9cA Universal Algorithm for Sequential Data Compression,xe2x80x9d IEEE Transactions on Information Theory, Vol. 23, May 1977, pp. 337-343. The LZ77 scheme uses a single pass literal/copy algorithm to compress or encode and decompress or decode a data sequence. Simply put, the LZ77 scheme compresses a data stream by replacing reoccurring patterns of data in an incoming data stream with short codes in a compressed output data stream. Typically, LZ77 schemes search for reoccurring strings of data that are three symbols or bytes in length. Symbols in the uncompressed input data stream are either directly incorporated into the compressed data output stream as uncompressed strings (referred to as xe2x80x9croot itemsxe2x80x9d or xe2x80x9cliteralsxe2x80x9d) or, alternatively, are replaced by pointers to a matching set of root items that has already been incorporated into the compressed data output stream (i.e., as xe2x80x9ccopy itemsxe2x80x9d). The copy items contain offset and length information that requires fewer or the same number of bytes as the replaced literal data. The offset, or offset value specifies the offset of the string being encoded relative to its previous occurrence. For example, if a specific literal string of five symbols occurred ten bytes before an identical occurrence that is being encoded, the offset is ten. The length specifies the length of the string being replaced by the copy item. In this example, the length field is five and specifies the length of the matching data sequence in symbols or bytes. Compression is realized by representing as much of the uncompressed data sequence as possible as copy items. Root items are typically incorporated into the compressed data sequence only when a match of three or more symbols cannot be found.
Statistical compression schemes, such as the Huffman scheme, build a statistical model of a data stream by reading and evaluating the entire data stream prior to compressing the stream. For example, during an initial evaluation of a data stream, Huffman encoding creates a statistical xe2x80x9ctreexe2x80x9d (Huffman tree) or ranking based on the number of occurrences of each symbol. This Huffman tree identifies each symbol in the data stream by frequency of occurrence. After the tree has been created, the repeated symbols in the stream are replaced with a variable-length code that corresponds to the value in the Huffman tree of that particular symbol, to create a compressed stream of data. For example, the most frequent symbol may be assigned the smallest code in the tree, thus replacing the most-frequently occurring symbol with the smallest possible code.
In addition to the use of either of the above-described LZ77 and Huffman encoding schemes alone, it has been found that these schemes can be combined to achieve even greater compression of a data stream than can be accomplished using a single compression scheme. A typical example of this type of xe2x80x9cdual-modexe2x80x9d compression scheme involves using Huffman encoding to further compress a data stream that has previously been output by an LZ77 scheme to create a Huffman/LZ77 compressed data stream. Because of their high compression, simplicity, and fast compression speed, the use of dual-mode compression schemes has become prevalent in typical data compression applications.
While typical dual-mode compression schemes or algorithms may be more effective in certain applications than single-mode compression schemes, these typical dual-mode compression schemes have several drawbacks that may severely limit their effectiveness for use in typical disc drives. One drawback that may occur when using a typical dual-mode compression schemes in a disc drive relates to decompression speed. Typical dual-mode compression algorithms are designed to be run on processors, such as those found in personal computers and workstations, that have significant stack space for executing programs. Put another way, dual-mode compression algorithms are typically optimized to be executed on processors having significant stack space and that allow for low interrupt and context switch latency. This is particularly true in dual-mode compression schemes that include a Huffman mode employing a recursive algorithm to process or navigate a typical Huffman tree.
In contrast to the processors used in personal computers or workstations, a typical disc drive employs a small processor with a very limited stack space. For example, disc drives often utilize a digital signal processor (DSP) as a primary processor. While DSPs typically have high computational speed, they often have very limited stack support and very high interrupt and context switch latency. For this reason, the processors normally used in disc drives are generally not suited for typical dual-mode compression algorithms that rely on significant stack space and low interrupt and context switch latency. In particular, the processors typically used in disc drives are not suited for dual-mode compression algorithms that include a typical Huffman scheme employing recursive algorithms.
Yet another potential problem with employing typical dual-mode compression schemes in a disc drive relates to the small size of the operating code used in disc drives. Dual-mode compression schemes are generally optimized for compressing large data files such as image, database, and audio files. As such, these compression schemes are typically not effective for compressing small data blocks into smaller data blocks, due to the xe2x80x9coverhead,xe2x80x9d such as dictionaries or Huffman trees, which the dual compression schemes require. For instance, typical Huffman trees are constructed to optimize both the compression of large data files and the speed of the compression step. This is typically achieved by making the Huffman tree quite large. For larger data files such as image files, this level of overhead is more than compensated for by the optimum compression speed and the overall compression achieved. But for small data files, such as the operating code for a disc drive, the overhead due to the size of a typical Huffman tree is unacceptable. In fact, for a small block of data, the size of the compressed data when added to the size of the Huffman tree can be larger than the original data file, thus negating the benefit of compressing the small data file.
It is with respect to these considerations and others that the present invention has been developed.
A dual-mode compression/decompression scheme has been developed for use in compressing operating code for data storage devices. Compressed data is created via a dual-mode compression scheme in a form that is conducive to rapid decompression, particularly. suited to applications where processing resources are limited, such as data storage devices. In one embodiment, a modified LZ77 type compression/decompression scheme is combined with a unique Huffman type compression/decompression scheme to provide a dual-mode scheme that is optimized for use in compressing and decompressing operating code for disc drives.
Another embodiment relates to a method of compressing a data input file to create a compressed data output file. The method includes sequentially examining the data input file using a sliding window compression scheme to create root items and copy items. The copy items each have a length value and an offset value. The offset values are subdivided into a most significant portion and a least significant portion that are dealt with differently by the method. The method creates an abridged root/copy file that includes the root items and the least significant portions of the offset values of the copy items. The method compresses the most significant portions of the offset values of the copy items to create a compressed offset file. The method also compresses the length values of the copy items to create a compressed length file. As part of the method, the abridged root/copy file, the compressed offset file, and the compressed length file are combined to create the compressed data output file.
In compliment to the above-mentioned method, another embodiment relates to a method for decompressing a compressed data input file that comprises a compressed length file, a compressed offset file, and an abridged root/copy file. In this method, the abridged root/copy file has root items and compressed copy items. Each of the compressed copy items comprises a least significant portion of an offset value of an uncompressed copy item. Each uncompressed copy item comprises a length value and an offset value.
In an initial step of the decompression method, the compressed data input file is receiving and the compressed length file is decompressed to obtain length values. Similarly, the compressed offset file is decompressed to obtain the most significant portions of the offset values. Uncompressed copy items are then formed from the least significant portions of the offset values obtained from the abridged root/copy file, the most significant portions of the offset values, and the length values. The method determines matching data strings identified by the uncompressed copy items and outputs a decompressed data output file comprising the root items and matching data strings.
Another embodiment relates to a data storage device with compressed operating code file stored in non-volatile memory therein, and a decompressor module operable to decompress the compressed operating code file. In this embodiment, the compressed operating code file includes a first compressed file, a second compressed file, and an abridged root/copy file.
These and various other features as well as advantages which symbolize the present invention will be apparent from a reading of the following detailed description and a review of the associated drawings.