Computer programs are groups of instructions that describe actions to be performed by a computer or other processor-based device. When a computer program is loaded and executed on computer hardware, the computer will behave in a predetermined manner by following the instructions of the computer program. Accordingly, the computer becomes a specialized machine that performs the tasks prescribed by the instructions.
A programmer utilizing a programming language creates the instructions comprising a computer program. Typically, source code is specified or edited by a programmer manually and/or with help of an integrated development environment (IDE). By way of example, a programmer may choose to implement code utilizing an object-oriented programming language (e.g., C#, VB, Java . . . ) where programmatic logic is specified as interactions between instances of classes or objects, among other things. Subsequently, the source code can be compiled or otherwise transformed to facilitate execution by a computer or like device.
A compiler produces code for a specific target from source code. For example, some compilers transform source code into native code for execution by a specific machine. Other compilers generate intermediate code from source code. This intermediate code is subsequently interpreted dynamically at runtime or compiled just-in-time (JIT) to facilitate cross platform execution, for example. Compilers perform lexical, syntactic, and semantic analysis as well as code generation.
A lexer performs lexical analysis in accordance with a grammar of regular expressions, for example. Lexical analysis is a process of converting a sequence of characters into tokens based on a program language specification. The lexer can be organized as a scanner and tokenizer, although such functional boundaries are often blurred. In fact, a lexer can also be referred to as a scanner or a tokenizer. The scanner, typically a finite state machine, iterates over a sequence of input acceptable and potentially unacceptable characters. The tokenizer classifies portions of input into tokens or blocks of characters.
A parser performs syntactic analysis on sequence of tokens provided by the lexer, for example, in an attempt to determine structure in accordance with a formal language grammar. Typically, syntactic analysis is accomplished with reference to a grammar that recursively defines expressions. The result of such analysis is a parse tree representing the syntactic structure of a set of tokens.
Subsequently, semantic analysis is performed with respect to the parse tree by way of top-down (e.g., recursive decent parser, LL parser (Left-to-right, Leftmost derivation) . . . ) or bottom-up (e.g., precedence parser, LR parser (Left-to-Right, Rightmost derivation) . . . ) approach. Semantic analysis involves determining the meaning of the code and performing various checks such as type checks, among other things.
A code generator produces code in a target language as a function performed analysis. In one instance, the code generator can utilize a source code representation such as an in-memory parse tree or other structure and related metadata to produce code. Generated code can correspond to a sequence of machine language instructions or some intermediate code representation, among other things.
Compilers can produce managed, unmanaged, and/or native code. Managed code can take advantage of a number of services such as memory management and security provided by a runtime. In other words, the code is managed by the runtime. Often, intermediate language code is managed. Unmanaged code does not receive services from a runtime but rather requires explicit machine calls to afford similar functionality. Native code refers to managed or unmanaged machine code. In some contexts, native code is used as a synonym for unmanaged code that runs natively on a machine. In other contexts, however, the term refers to machine code output from a JIT compiler that executes in a runtime. Here, the code may be managed but it is also machine code rather than simply intermediate language code.