The ordering of events is fundamental to the study of the dynamic behavior of a system. In a sequential process, it is natural to use strings of symbols over some alphabet to specify the temporal ordering of events. The symbols may, for example, correspond to the states, commands, or messages in a computation. J. Larus, “Whole Program Paths,” ACM SIGPLAN Conf. Prog. Lang. Des. Implem., 259–69 (May, 1999), applies a lossless data compression algorithm known as “Sequitur” to the sequence of events or signals determining the control flow or operations of a program's execution. Sequitur is an example of a family of data compression algorithms known as grammar-based codes that take a string of discrete symbols and produce a set of hierarchical rules that rewrite the string as a context-free grammar that is capable of generating only the string. These codes have an advantage over other compression schemes in that they offer insights into the hierarchical structure of the original string. J. Larus demonstrated that the grammar which is output from Sequitur can be exploited to identify performance tuning opportunities via heavily executed subsequences of operations.
The underlying premise in using lossless data compression for this application is the existence of a well-defined linear ordering of events in time. A partial ordering of events is a more accurate model for concurrent systems, such as multiprocessor configurations, distributed systems and communication networks, which consist of a collection of distinct processes that communicate with one another or synchronize at times but are also partly autonomous. These complex systems permit independence of some events occurring in the individual processes while others must happen in a predetermined order. Noncommutation graphs are used for one model of concurrent systems. To extend Larus' ideas to concurrent systems a technique is considered for compressing an input string in a manner that an equivalent string relative to a noncommutation graph is produced upon decompression.
The compression of program binaries is important for the performance of software delivery platforms. Program binaries are files whose content must be interpreted by a program or hardware processor that knows how the data inside the file is formatted. M. Drinić and D. Kirovski, “PPMexe: PPM for Compressing Software,” Proc. 1997 IEEE Data Comp. Conf., 192–201 (March 2002), discloses a compression mechanism for program binaries that explore the syntax and semantics of the program to achieve improved compression rates. They also compress data relative to a noncommutation graph. The disclosed compression algorithm employs the generic paradigm of prediction by partial matching (PPM). While the disclosed compression algorithm performs well for many applications, it introduces certain inefficiencies in terms of compression and delays.
A need therefore exists for a more efficient algorithm for compressing an input string given a set of equivalent words derived from a noncommutation graph. A further need exists for a decompression technique that reproduces a string that is equivalent to the original string.