This invention relates generally to software development environments, and more particularly to compressing representations of computer executable code
A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owners have no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserve all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawing hereto: Copyright(copyright) 1999, Microsoft Corporation and the Association for Computing Machinery, Inc., All Rights Reserved.
As computer programs have become more complex, the amount of executable code in a typical program has also increased. The increase in the code size results in increased storage requirements of the executable file, and more importantly, increased bandwidth consumption or increased download times when the code is transferred over a network. In addition, many embedded computers have limited amounts of ROM available for program storage. As a result, compression of the executable code is desirable to reduce both storage requirements and network transfer time.
Previous systems have used one or more of several general-purpose data compression systems to reduce the size of executable code. Many general-purpose data compression systems comprise a statistical modeler followed by a coder. As the input is compressed or decompressed, the modeler tracks some context in the input, and associates with each context a probability distribution that the coder (e.g., an arithmetic coder) uses to encode the next token in the input stream. For example, when compressing English text, the letter Q is often followed by the letter U, so a good modeler responds to a Q by switching to a frequency distribution that assigns a high probability to a U and thus encodes it in less space.
Markov models use a number of immediately preceding tokens to help predict and compress the next token. For example, an order-1 model uses the immediately preceding token, an order-2 model uses the 2 immediately preceding tokens and so on. For an alphabet A, an order-N model can use up to |A|N probability distributions, one for each combination of the last N tokens. Thus, for an alphabet comprising 256 possible values, an order-1 Markov modeler would use 256 probability distributions, and order-2 modeler would use 65,536 probability distributions etc.
Prediction by Partial Matching (PPM) modelers blend or switch on the fly between several Markov models, preferring more history when the recent context has been seen often, and backing off to use less history when it has less experience with the current context.
In each case, the modeler""s objective is to assign a non-zero probability to every valid message (sequence of tokens), and high probabilities to messages that resemble those in some representative training set. The higher the probability assigned to a message M comprising tokens m1 m2 . . . mN, the shorter its minimum code-length, or entropy.
Code-specific compression mechanisms have been used in addition to the general-purpose compression systems described above. In one example of such code-specific compression, the code produced by compilers is reviewed, either manually or programmatically, for instruction combinations that appear frequently. Then special composite operation codes (opcodes) are designed that replace the frequently appearing instruction combinations. A problem with such an approach is that only set patterns appearing in the code will typically be discovered, while other context that can supply useful information is ignored.
While general-purpose data compression systems can successfully compress compiler generated code, there is a need in the art for systems and methods that can take advantage of the characteristics of compiler generated code to compress such code. In addition, there is a need for such a system that automatically discovers context that can be used to compress the code further than is possible with either general-purpose data compression systems or with the current code compression systems.
The above-mentioned shortcomings, disadvantages and problems are addressed by the present invention, which will be understood by reading and studying the following specification.
In one system for inferring frequency distribution models for compressing data, the system reads a set of training data. In one aspect of the system, the training data can comprise IR code, however code for virtual or real machines could also be used as training data. Tokens are read from the training data. For each token, certain context is saved. The saved context comprises predictors that can be used to predict the token. The predictors include Markov predictors, computed predictors, and reduced predictors.
In a further aspect of the system, the set of token and predictor values read from the training set is presented to a machine-learning component that applies machine-learning algorithms that create a decision tree. The branch nodes of the decision tree comprise conditions that test the predictor values, while the leaf nodes comprise frequency distribution models that vary depending on the conditions in the paths from the root leading to the leaf nodes. The decision tree created using the system can be input to a modeler component of a code compression system.