The present invention relates to computing systems, and more specifically, to the generation of source code and executable code from formal descriptions.
For many applications, the structure of data to be processed can be described mathematically using formal languages like regular expressions, and context-free grammars. Such data descriptions may find application in compiler construction. In addition, such structures may be used in manipulating or evaluating mathematical expressions and in programs with general text input such as computer games and search engines. Such expressions may also be used in cache and network protocols.
Computer programs analyzing such structured data can be generated automatically from the formal description. Such tools are typically referred to as compiler generators. A compiler-compiler or compiler generator is a tool that creates a scanner, parser, interpreter, or compiler from some form of formal description. The earliest and still most common form of compiler-compiler is a parser generator, whose input is a grammar (usually in Backus-Naur Form (BNF)) of a programming language, and whose generated output is the source code of a parser.
The ideal compiler compiler takes a description of a programming language and a target instruction set architecture, and automatically generates a usable compiler from them. In practice, the state of the art has yet to reach this degree of sophistication and most compiler generators are not capable of handling semantic or target architecture information.
Compiler generators typically include scanner generators and parser generators and have been available, in simple forms, since the late 1960's. A scanner generator typically processes regular expressions while parser generators process context free grammars.
In computing, regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters. Regular expressions (abbreviated as regex or regexp, with plural forms regexes, regexps, or regexen) are written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification. In formal language theory, a context-free grammar (CFG) is a grammar in which every production rule is of the form V→w where V is a single non-terminal symbol, and w is a string of terminals and/or non-terminals (possibly empty). The term “context-free” expresses the fact that non-terminals can be rewritten without regard to the context in which they occur. A formal language is context-free if some context-free grammar generates it. Context-free grammars play a central role in the description and design of programming languages and compilers. They are also used for analyzing the syntax of natural languages.
The regular expressions and context free grammars may optionally contain interspersed C code fragments. In sum, the compiler generator typically, from the input of the scanner generator and the parser generator (with the additional C code) generates source code (as an executable) that is later translated by a compiler.