A type system is a system used in programming languages to aid in the detection and prevention of run-time errors. A programming language is “typed” if it contains a set of types that are declared for objects such as variables, functions, etc., and these types are checked versus a set of rules during compilation of a program written in the language. If the source code written in the typed language violates one of the type rules, a compiler error is determined.
Typed intermediate languages for use in compilers have received significant study in the research community over the past few years. They enhance the reliability and robustness of compilers, as well as provide a systematic way to track and check information needed by garbage collectors. The idea is to have an intermediate representation that has types attached to it and that can be type-checked in a manner analogous to type-checking for source programs. However, a typed intermediate language is more difficult to implement because types that represent items made explicit during the compilation process are necessary.
A typed intermediate language is even more difficult to implement if it must represent a number of different high-level programming languages. The different languages not only have different primitive operations and types, but the high-level programming languages have different levels of typing. For instance, some languages, such as assembly languages, are generally untyped. In other words, they have no type system. Of the languages that are typed, some are strongly typed while others are more loosely typed. For instance, C++ is generally considered a loosely typed language, whereas ML or Pascal are considered strongly typed languages. Further, some languages that are loosely typed have smaller sub-sets of the language that allow for a majority of the code sections within a program to be strongly typed, while other code sections are loosely typed. For example, C# and Microsoft Intermediate Language used in .NET (MSIL) allow this. Therefore, a typed intermediate language used to represent any of these high-level languages must be able to represent different types strengths. Likewise, the type system of such a typed intermediate language must be able to implement different rules depending on characteristics of the code being type checked.
Another problem arises when a typed intermediate language is lowered throughout the process of compilation. The lowering of a language refers to the process of changing the form of a language from a higher level form, such as what a programmer would write, to a lower level, such as to an intermediate language. The language can then be further lowered from the intermediate language to levels closer to what a computer executes, such as machine-dependent native code. In order to type-check an intermediate language that is lowered to different levels during the compilation process, a different set of rules must be used for each representation.
Attempts to create typed intermediate languages often fall short of solving the problems discussed above. For instance, Cedilla Systems' Special J compiler uses a typed intermediate language. However, this compiler is specific to the Java source language and therefore did not need to process multiple languages that may, for instance, have non-type-safe code. Additionally, this compiler only uses one set of rules for type-checking and therefore could not be used for multiple levels of compilation. In the research community, typed intermediate languages often tend to be highly specific to the source language and difficult to engineer (and design the types) for the multiple stages of compilation.