Modern “smart” source code editors provide a wide range of features to the software developer based on increased understanding of the underlying programming language. For example, these editors may provide syntax coloring to highlight various components of the language grammar (class definitions, fields, methods, comments, etc.) The editors may also highlight known errors in the code. In general, this increased understanding of the underlying programming language is achieved by adding a language specific lexical analyzer and/or parser to the editor.
Unfortunately, most large-scale development projects include several programming languages targeted at different domains and different classes of developers. For example, it is not uncommon for a modern web application to include Java, Java Server Pages (JSP), JavaScript, the Hypertext Markup Language (HTML) and the eXtensible Markup Language (XML). Therefore, a typical development environment may effectively include several “smart” source code editors, each with an embedded lexical analyzer and/or parser specific to a given language.
Developing and maintaining a separate editor for each language in the development environment is costly and time consuming. Each time a new language is needed, a new editor must be constructed. Each time a new editing feature is added, it must be added to each language module. In addition, keeping the features of the language editors in sync can be a challenge. Minor differences between the editors in a given development environment can result in an inconsistent and confusing experience for the developer.
In addition, it is becoming increasingly useful to embed one language inside another within a single source file. For example, JSP pages include Java and JSP tags embedded within HTML. Emerging languages, such as ECMAScript for XML(E4X) embed XML within JavaScript. Other emerging technologies, such as Java for Web Service (JWS) embed a small annotation language inside Java comments to succinctly describe how that Java class should be exposed as a web service. In some cases, several languages can be nested several layers deep in a single source file.
The simple lexical analyzers and parsers embedded in common source editors are usually not sophisticated enough to recognize and process nested languages. Therefore, in some environments advanced source code editing features are simply not available for nested languages.
In other environments, a new editor might be constructed specially tailored to handle each new language combination even if separate editors already exist for each of the nested languages. For example, a JSP editor might be constructed to handle the combination of HTML, JSP tags and Java, even though separate HTML and Java editors already exist in the development environment. A new E4X editor may be constructed even though separate ECMAScript and XML editors already exist. This may result in duplication of code and will likely result in inconsistent behaviors as the different language editors evolve.
As language nesting becomes more popular, the increase in cost and time required to develop and maintain a comprehensive suite of smart editors using traditional methods becomes combinatorial.
To make matters worse, some nested languages appear in several contexts. For example, XML may be embedded in ECMAScript, Java and JWS annotations. In addition, small expression languages such as those required to understand date and time formats (e.g., YYYY-MM-DDThh:mm:ssTZD from ISO 8601) or time durations (e.g., 15 h 4 m30 s) may be embedded in several different languages. Adding these common sub-languages separately to each editor's lexical analyzer and/or parser again results in increased development and maintenance costs and potentially inconsistent behaviors. Any changes to the way these common sub-expressions are handled should be applied uniformly across all applicable host languages.
In a typical Integrated Development Environment (IDE), there are often two compilers. The first compiler is run from the command line, displaying a list of errors or emitting runable code. The second compiler exists as part of the IDE. Initially, this compiler may only implement lexical analysis of source code in order to support syntax coloring. Then it may implement syntactic analysis in order to support the structure pane and class browser. Eventually, this compiler will contain a nearly complete front-end in order to support code completion.
The trend of moving more and more of the compiler into the IDE is understandable: advanced IDE features are often based on advanced understanding of the language being edited. Unfortunately, it is not normally possible to use the command-line compiler inside the IDE. First, it is normally not componentized in such a way that the information needed by the IDE is easily accessible. Second, it is usually far too slow for interactive use as changes are made, especially if it takes multiple passes over the files. Third, it almost always recovers poorly from errors, which amongst other problems, makes code completion impossible.
These issues force the IDE to create its own compiler. However, supporting two compilers has many disadvantages. First, it is nearly twice the work of implementing a single compiler, particularly where the back-end is a fairly high-level language (i.e. Java bytecodes) and no optimization is performed. Second, the IDE's compiler is typically the second class citizen, and as a result, it is usually of lesser quality. Few IDE's actually implement 100% of the analysis in the command line compiler. Furthermore, the IDE's compiler is often designed in an evolutionary manner as new features are needed, resulting in a poorly organized compiler. Third, two different code bases need to be updated in order to make changes to the language. This makes creating a new language a slow and painful process. These problems get worse as the platform is scaled in the number of languages it supports and in the number and sophistication of IDE features.