For many compiled languages, source-level types are erased very early in the compilation process. As a result, further compiler passes may convert type-safe source code into type-unsafe machine code. Type-unsafe idioms in the original source and type-unsafe optimizations cause type information to be essentially non-existent in a stripped binary. The problem of recovering high-level types by performing type inference over stripped machine code is referred to as type reconstruction, and offers a useful capability in support of reverse engineering and decompilation.
Although some conventional techniques are available for determining type information from machine code, improved techniques are desired. Embodiments of the present invention provide an improved, robust and efficient technique for reconstructing type information from machine code.
According to an example embodiment, a computing system comprising at least one memory, a display and at least one processor is provided. The at least one processor is configured to execute functionality including receiving a machine code of a program, generating an intermediate representation of the machine code, generating a plurality of type constraints from the intermediate representation, generating one or more inferred types based at least upon the plurality of type constraints, converting the generated inferred types to C types, updating the intermediate representation by applying the inferred types to the intermediate representation, and outputting said inferred types, said converted C types, and/or at least a portion of the updated intermediate representation.
The converting of the generated inferred types to C types may be performed after the inferred types are generated from the intermediate representation.
The generating of one or more inferred types may include assigning a sketch to each of the inferred types. The converting to C types may include converting the sketch to one or more of said C types, where the sketch includes a record of capabilities of the inferred type to which it is assigned.
The sketch may be represented by a tree data structure, where edges of the tree represent labels corresponding to said capabilities and nodes of the tree represent type variables or type constants.
Assigned sketches may be arranged in a lattice formed by markings relating respective sketches to one or more other sketches. Type constraint may be represented in the lattice by a path from the root with a label sequence. The markings may be configured to encode higher level information including typedef name information.
The generating a plurality of type constraints from the intermediate representation includes at least one of (A) determining inputs/outputs of each procedure, (B) determining a program call graph, and (C) determining per-procedure control flow.
The computing system may further be configured to execute functionality comprising using an abstract interpreter, generating sets of type constraints from concrete TSL semantics, inserting type schemes for externally linked functions, and simplifying each constraint set.
The computing system may further be configured to execute functionality comprising assigning sketches to type variables, and specializing type schemes based on calling contexts. Further configuration may include executing functionality comprising converting inferred sketches to C types applying heuristic conversion policies.
The generating of the inferred types may be based upon subtyping. The subtyping may be implemented using type scheme specialization of subtype based constraint sets. The generating the inferred types may include interpreting recursive data structures.
The computing system may further be configured to provide for an end user to define or adjust an initial type hierarchy at run time.
The generating of a plurality of type constraints may include splitting out read and write capabilities of a pointer to have separate constraints in the plurality of type constraints. The plurality of type constraints may be generated in a bottom-up fashion over the strongly-connected components of the call graph, and where sketches are assigned to type variables while the call graph is being traversed bottom-up.
The generating of the plurality of type constraints may include creating a simplified plurality of type constraints by operations including lazily evaluates pointer derived constraints, while non-lazily evaluating other constraints.
According to another example embodiment, a method performed by at least one processor to infer types from a program is provided. The method includes receiving a machine code of the program, generating an intermediate representation of the machine code, generating a plurality of type constraints from the intermediate representation, generating one or more inferred types based at least upon the plurality of type constraints, converting the generated inferred types to C types, updating the intermediate representation by applying the inferred types to the intermediate representation, and outputting said inferred types, said converted C types, and/or at least a portion of the updated intermediate representation.
According to another example embodiment, a non-transitory computer-readable storage medium is provided. Instructions stored on the storage medium, when executed by a computer, may cause the computer to perform operations including receiving a machine code of a program, generating an intermediate representation of the machine code, generating a plurality of type constraints from the intermediate representation, generating one or more inferred types based at least upon the plurality of type constraints, converting the generated inferred types to C types, updating the intermediate representation by applying the inferred types to the intermediate representation, and outputting said inferred types, said converted C types, and/or at least a portion of the updated intermediate representation.
These aspects, features, and example embodiments may be used separately and/or applied in various combinations to achieve yet further embodiments of this invention.