Source-to-source transformation techniques are used in compilers, for transforming computer source code written in computer programming languages, typically for program translation and optimization. Similar techniques are used in software maintenance activities (such as porting and migration). Various challenges arise, though, due to the myriad of language dialects and multiple/mixed language contexts that exist in large-scale “real-world” application codes.
Open languages, such as C and C++, present particular challenges due to the open nature of their standards. As a consequence of their open nature, vendors may (and do in many cases) provide divergent but standard-conformant behaviors for these open languages. Open languages may include specifications for particular behaviors, such as: “implementation-defined behavior”, “unspecified behavior”, “undefined behavior”, and “locale-specific behavior”. Such behaviors and related concepts are described in further detail in the C programming language standard that is published as ISO/IEC 9899: 1999 C standard (1999) and ISO/IEC 9899:1999 C Technical Corrigendum (2001). These publications is available at http://www.iso.org.
The C++ programming language is a “superset” of the C programming language, and a C++ programming language standard is published as ISO/IEC 14882:1998 C++ standard (1998), and is also available at http://vww.iso.org. The premise of C++, as a “superset” of C, serves to exacerbate problems relating to porting C++ programs to a new computing environment. The evolution of C/C++ itself has given rise to a range of porting problems, such as “quiet changes”. Such issues are described in further detail in a paper entitled “Rationale for International Standard Programming Languages—C Revision 5.10 April 2003”. This paper represents the work of INCITS J11 and SC22 WG14, which are respectively the ANSI Technical Committee and ISO/IEC JTC 1 Working Group charged with revising the International Standard for the C programming language. This paper is available in electronic form from: http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf.
Earlier dialects (prior to standardization) of the C programming language (for example, Johnson pcc, Reiser cpp) have informal roots, which cast ambiguity over their definitions. Existing implementations of such early dialects are, in essence, the sole definers of these dialects.
The popularity of C/C++ has promoted experimentation, resulting in the development of non-conformant dialects for the C/C++ languages, such as Unified Parallel C (UPC). The specification for the UPC language is published as: T. A. El-Ghazawi, W. W. Carlson, and J. M. Draper, “UPC Language Specifications V 1.0”, Feb. 25, 2001.
Large application codes, written in a mix of language dialects, present particular challenges for particular maintenance and porting activities, despite various existing approaches to the problems associated with multi-language/dialect coverage. These existing approaches may be characterized as formal in nature, and include the Stratego approach and the so-called DMS approach, both of which are briefly described below.
The Stratego language specifies program transformation by traversals over an abstract syntax tree (AST). The Stratego language is described in: Visser, E. Stratego, “A language for program transformation based on rewriting strategies” (A. Middledrop, editor), in Rewriting Techniques and Applications, 2001 (RTA '01), Springer-Verlag Lecture Notes in Computer Science Vol. 2051, 357-361.
A Stratego specification requires an explicit specification of an AST definition, as well as a traversal strategy. The combination of this explicit specification and the traversal strategy is then automatically converted into a source-to-source program transformer. This approach is, however, suitable only for fully automated program transformation, which is of little if any practical use for “real-world” applications where interactive remediation is typically required.
The DMS approach (Baxter, I. D., Pidgeon, C. and Mehlich, M. “DMS: Program Transformations for Practical Scalable Software Evolution”, In Proceedings of the IEEE International Conference on Software Engineering (ICSE '04), Edinburgh, United Kingdom, May 23-28, 2004, pages 625-634) similarly proposes a separate specification of real-world code, its transformation, and the programming language. Accordingly, this related formal approach is also of limited practical application.
These two approaches, described directly above, fail when a formal language specification is either not available, or is ambiguous (that is, is not definitive). Accordingly, absent a formal computer language specification, existing techniques are unsuitable for source-to-source transformation of computer source code. There is thus a need for a way of addressing these and other deficiencies of existing approaches to source-to-source transformation of computer source code.