In the latter half of the twentieth century, there began a phenomenon known as the information revolution. While the information revolution is a historical development broader in scope than any one event or machine, no single device has come to represent the information revolution more than the digital electronic computer. The development of computer systems has surely been a revolution. Each year, computer systems grow faster, store more data, and provide more applications to their users.
A modern computer system typically comprises one or more central processing units (CPU) and supporting hardware necessary to store, retrieve and transfer information, such as communication buses and memory. It also includes hardware necessary to communicate with the outside world, such as input/output controllers or storage controllers, and devices attached thereto such as keyboards, monitors, tape drives, disk drives, communication lines coupled to a network, etc. The CPU or CPUs are the heart of the system. They execute the instructions which form a computer program and directs the operation of the other system components.
From the standpoint of the computer's hardware, most systems operate in fundamentally the same manner. Processors are capable of performing a limited set of very simple operations, such as arithmetic, logical comparisons, and movement of data from one location to another. But each operation is performed very quickly. Sophisticated software at multiple levels directs a computer to perform massive numbers of these simple operations, enabling the computer to perform complex tasks. What is perceived by the user as a new or improved capability of a computer system is made possible by performing essentially the same set of very simple operations, but using software with enhanced function, along with faster hardware.
In the very early history of the digital computer, computer programs which instructed the computer to perform some task were written in a form directly executable by the computer's processor. Such programs were very difficult for a human to write, understand and maintain, even when performing relatively simple tasks. As the number and complexity of such programs grew, this method became clearly unworkable. As a result, alternate forms of creating and executing computer software were developed. In particular, a large and varied set of high-level languages was developed for supporting the creation of computer software.
Typically, high-level languages represent instructions, fixed values, variables, and other constructs in a manner readily understandable to the human programmer rather than the computer. Such programs are not directly executable by the computer's processor. In order to run on the computer, the programs must first be transformed from a human-readable form (source code) to something executable by the computer. In general, source code is universal and understandable by anyone trained to use the applicable language, while executable code is specific to a particular computer system environment (model of hardware, operating system, etc.), and can only execute on that computer system or one similarly configured.
Most high-level language programs support some form of memory allocation and re-use during run-time, i.e. during execution. I.e., certain state variables needed by the program may be relatively small and fixed in size and number, so that space for these may be allocated before the program is executed. But in the case of much of the data read or generated by the program, the volume of memory required is large and/or it is not known in advance how much memory will be needed. For such data, available free memory is allocated for the data as needed during program execution. Often, the need for such data is very transient, and once a particular block of code has been executed, the data is no longer needed. Certain high-level languages in particular tend to generate a large number of such temporary data structures. If temporary data is allowed to permanently occupy addressable memory space and accumulate in addressable memory, the program will consume far more memory than they actually need at any one time during execution, and the addressable memory capacity of the system may be taxed. Therefore, such temporarily needed memory space is generally re-used by allocating it to some other data after the original temporary data is no longer needed.
The re-use of memory space for different data in an executing program introduces an issue generally referred to as type-safety. Programming languages support different types of data, i.e., data represented internally in different formats. A data structure for which memory is allocated has a specified internal format according to the programming language semantics and directions of the programmer. Any subsequent reference to the data in the program expects to see data in the specified format. Most high-level languages support some form of pointer to memory locations containing data. Pointers may be stored as variables, just as any other data.
If memory space is initially allocated to data in one format, and subsequently re-used for data in a different format, it is possible that a pre-existing pointer (“dangling pointer”) will erroneously be used to access data in a different format from that which was expected, referred to as a type violation. Results of such an operation are unpredictable. Such type violations can be very difficult for the programmer to diagnose. The issue of type-safety is well-known in the programming community. Some programming languages provide greater protection from type violations, i.e., greater “type-safety”, than others. However, greater type-safety generally comes at a cost in efficiency and/or programming flexibility.
A common approach to type-safety is garbage collection. Various garbage collection techniques are known in the art, but in general garbage collection is a process in which the program state data is first analyzed to find all pointer references, and from this determine all dynamically allocated memory which is no longer referenced by any pointer. The unused memory is then made available for re-use. Garbage collection can be performed serially, i.e., by temporarily halting program execution, or in parallel using a separate thread running in parallel with the main application thread(s), or incrementally, periodically pausing the application for a short period of time to do a small piece of the larger garbage collection process. Incremental collection involves greater complexity than serial collection, and parallel garbage collection greater complexity still. Garbage collection manages memory in a type-safe manner, but consumes very substantial processor and other system resources.
Another approach to type-safety is the use of “fat” pointers, i.e. pointers which carry with the additional data, including in particular a “capability” which uniquely identifies the allocated memory region. Whenever an area of memory is reused, it is given a new capability. The pointer's capability is checked each time it is used to reference something, and if it does not match the currently allocated capability for the memory address being references, then it is known that the pointer is invalid, and the system can take appropriate action. The capability must be sufficiently large to assure uniqueness. The use of fat pointers also involves substantial overhead, because it increases the size of pointers and requires additional checks when pointers are used to reference data.
Certain programming languages do not use these or substitute constructs for assuring type-safety, and are vulnerable to type violations. For example, the C++ programming language allows the programmer to manage memory. I.e., the programmer manually specifies by code instruction when a data structure can be deallocated and its memory reused for other data. This form of memory management is inherently more efficient and flexible than garbage collection or other forms of type-safety, but it also allows the programmer to leave dangling pointers when deallocating a data structure, which can result in a type violation. Despite this vulnerability, C++ continues to be widely used for the advantages it provides.
Various modifications to C++ have been proposed which incorporate garbage collection or other forms of type-safety, but in general these forfeit many of the advantages of flexibility and efficiency for which C++ is known, and for which it is chosen in a variety of applications. It would be desirable to provide improved type-safety for C++ and similar programming languages, while preserving the language semantics and the advantages that such languages provide.