Computer programs typically execute on computers or equivalent systems that comprise a processor and a memory. A computer program executed by an operating system is represented by one or several processes. Physical memory is typically managed by a computer's operating system in order to provide each process with a virtual memory space. The virtual memory space of a process is accessed by the program to write and read values. Each process has its own virtual memory space. Alternatively, computer memory can be managed directly, without using virtual memory. Memory is organized into locations, each having a unique address. Typically, memory is represented by a contiguous array of cells with byte-level addressing.
A process of reserving memory for the use of an application is called allocation. Memory is allocated in memory blocks, where a memory block refers to a contiguous memory region identified by its start and end addresses.
A virtual memory space of an executing process is typically divided into several memory segments used for different purposes. These segments are represented by disjoint contiguous memory regions.
Stack is a memory segment commonly assigned to saving local variables automatically allocated and de-allocated by programs at runtime. Stack memory is reserved for automatic memory allocation at runtime.
Global memory refers to memory allocated by programs statically, at compile-time. Static allocation typically represents global variables used by a program.
Heap memory is usually the largest part of the memory and is reserved for dynamically allocated memory blocks. Typically, a program can dynamically allocate memory by calling a dedicated function. An example of such a function is the malloc function in the C programming language. When the allocated memory is no longer required, the program can also call an operating system procedure to deallocate the allocated memory so that it can be reused by the program.
The invention applies more particularly to stack (automatically allocated) memory but can also apply to dynamically or statically allocated memory.
At a source code level of programming languages, memory can be accessed using pointers that are special variables containing addresses of memory locations. Some pointer p is said to point to a memory block B if p stores an address from B.
A memory access refers to reading a value from memory location or writing a value to a memory location.
A problem may arise if a program accesses a memory location that was not allocated. Another problem may arise when the program accesses an allocated memory location through a pointer which does not point to a memory block containing that location.
Problems mentioned in the above paragraph relate to a broader class of issues often referred to as memory safety, which includes (but are not limited to) illegal memory accesses, memory leaks, illegal dereferences, double free errors, reading uninitialized data. Consequences of such problems differ in severity and range from inconsistent behaviors to issues compromising security of applications and program crashes. It is therefore important to detect such memory violations.
The general purpose of the invention is to provide a shadow-state encoding mechanism that allows to track the memory state of an executing program at runtime. Even though the invention is general and potentially applies to heap memory as well as global allocations, its main focus is on tracking memory blocks allocated on a program's stack at runtime.
Memory shadowing is a general technique for tracking properties of an application's data at runtime. In its typical use, memory shadowing associates addresses from the application's memory to shadow values stored in a disjoint memory region (or regions) called shadow memory. During a program's execution shadow values act as metadata that store information about the memory addresses they are mapped to.
Memory shadowing has many applications, one of them is memory analysis where shadow values are used to track memory and detect safety problems. Examples of such existing mechanisms are described in particular in references [1] and [2].
Shadow state encoding refers to a process of designing the structure of shadow values and their interpretation. The prior art contains shadow state encoding mechanisms that vary across different tools. Some implementations use shadow values to store bit-level states of the memory locations they aim to characterize.
Reference [3] discloses a tool using shadow state encoding focused on detection of information leakage at runtime. The proposed method uses one bit to tag each addressable byte from an application's memory as public or private. Another method disclosed in reference [4] relates to a memory debugger which shadows one byte by two bits which indicate whether that byte is allocated and initialized. Reference [2] introduces a method that uses bit-to-bit shadowing to track initialization status of every bit. Reference [5] proposes to customize memory allocation to ensure that memory blocks are allocated at an 8-byte boundary, and to track aligned 8-byte sequences by one shadow byte. American patent U.S. Pat. No. 8,762,797 also describes the same method as reference [5].
The shadow state encoding methods of prior art have been proven useful for tracking memory at bit-level and byte-level. These methods, however, are limited in their capacity to identify properties with respect to memory blocks. More particularly, the existing tools using shadow memory do not capture enough metadata to identify the bounds and the length of a memory block a given address belongs to. Therefore, existing methods cannot detect a memory violation concerning an access to an allocated memory location through a pointer which does not point to a memory block the location belongs to.
The present invention is proposed in view of the above problem and relates to the use of shadow memory during runtime memory-safety analysis of computer programs. The invention aims at resolving the limitations of the prior art's shadow state encoding methods with a new method that allows tracking boundaries of allocated memory blocks while still capturing byte-level properties. This is achieved with a particular shadow memory encoding scheme which captures boundaries and lengths of allocated memory blocks. Analyzing the shadow memory state allows detecting memory safety issues.
In particular, for a memory location given by its address a, the proposed invention allows computing the following information: whether a has been allocated, whether a has been initialized, the start (base) address of the memory block a belongs to, the byte-length of the memory block a belongs to, the byte offset of a within its block. Such information allows for detection of specific memory safety issues at runtime.