With the modern emphasis on program portability and the new need to run programs on multiple computers in networks or over the Internet, it would be very useful for C programmers to be able to translate either legacy or newly-written C programs into Java to make them more portable. However, translation by hand is too tedious and time-consuming, while previously available computer algorithms to do so are not very accurate and/or require human intervention.
Both the C programming language and the Java programming language are versatile, powerful, and popular among programmers. C is commonly used when creating operating systems, network interfaces, and other programs which require the ability to manipulate memory usage, binary data, and similar low-level constructs. Java has two major advantages over C, however. The first is its modularity, as it is capable of being used on any platform and any operating system, while implementations of C are platform-specific and must be recompiled or sometimes rewritten when moved from one computer to another. The second is the fact that coding in Java is easier for the programmer than coding in C, as details such as memory usage and data size are not handled by the programmer but by the Java Virtual Machine. For these reasons, translation of programs from C into Java are most beneficial when programs are required to run under different operating systems or machine specifications, when a less-experienced programmer needs to modify a program originally written in C, or a combination of the two, though these are by no means the only scenarios under which translation would be beneficial.
Shifting from the programmer's perspective to a consideration of program functionality, there are three major groups of programs that benefit from translation from C to Java. First are “legacy” programs that were originally written in C to take advantage of its higher execution speed. However, as modern computers have more memory and run faster than those of even a few years ago, these “legacy” programs would gain more from added portability than they would from remaining in C. Second are programs wherein the majority of the code implements simple algorithms such as string tokenization, data storage and manipulation, and the like. Java already has several implementations of algorithms such as these built into it, so code could be simplified and shortened. Third are programs that will be used either over a network or the Internet. While C has methods for sending and receiving information between different computers, any programs that require a user interface on the other end of transmissions would benefit greatly from Java's portability and its already-implemented applet system.
While the differences among programming languages have been studied extensively in comparative languages courses and otherwise, little progress has been made in the area of automated programming language translation. One company, Jazillian, Inc., provides translations among a limited number of languages for a fee, but significant client involvement is required to tailor the algorithm to the program's intended use. The “Jazillian” conversion software is capable of incorporating C header files into multiple class files, renaming files, and making other alterations when multiple files are involved, but those functions require human intervention to set up naming conventions, alter code used to include methods from other classes, and make minor corrections in the translated code. In addition, some of the more complex translation cases are handled by Jazillian-created classes used by the new code, meaning that the client and anyone else wishing to use the resulting Java code must be able to access those classes, which defeats Java's purpose of being able to run equally on any platform with standard Java specifications, and thus partially defeating the purpose of translation in the first place.
The problems involved with automated translation occur because programming languages are too dissimilar for direct word-for-word translation. For example, Python and Ruby, open-source programming languages by Python Software Foundation and Yukihiro Matsumoto, respectively, do not declare variables and use indentation instead of braces, “(” and “}”, to denote blocks of text, in comparison to the C and Java methods of declaring variables and separating code.
The “C2J” conversion software offered by Novosoft LLC is another very accurate C to Java translator, but it suffers from two major flaws. First, one of its stated goals is to exactly duplicate the function of the original C code, which causes it to attempt to use precisely the same memory requirements and execute code in precisely the same way in Java as in C, even when Java's native memory handling is superior and when there are already Java methods implemented to perform the same function as the C code. This causes the translated code to be less efficient and more memory-intensive than the same Java program written in Java originally, and where many procedures could be handled by a single line of Java code, many additional lines are used to duplicate C functionality. Its second flaw is human readability. Between the facts that it attempts to function identically to C and thus requires a great deal of additional code for memory management and duplication of C procedures and that it changes many names to fit C2J's naming conventions, the resulting code will execute perfectly but is practically incomprehensible to a human programmer who would wish to modify the code, which once again defeats the purpose of translation.
Other examples of programming language translators are described in U.S. Pat. Nos. 6,453,464 and 7,213,216.
The U.S. Pat. No. 6,453,464 describes a COBOL to Java translator wherein source language primitive functions are represented by archetypal templates having code that is selectable based upon the applicable case. This basically means that COBOL methods are generalized to a group of templates before conversion, such as several functions that read data from files being collectively described by two or three “generic file input” templates. Then, since it is “selectable based on the application case,” one can assume that the translation algorithm requires human input to determine which of the possible functions or classes representing the COBOL code fits best for the purpose for which the original code was intended. This algorithm would thus have two significant differences from the present invention, and potential weaknesses, were it applied to C to Java translation. First, once the Java templates are assigned, the translator requires human intervention to choose the best one. This implies that the translator does not perform a literal translation, but only a functional one (translating code so it does the same thing, but not necessarily the same way), which can cause problems if the code relies on an idiosyncrasy of C to do its task. Second, the translator has to assign templates. If the C code does not have a readily-discovered purpose (which is very possible, given C's ability to directly manipulate memory without using easily-classified methods) then the algorithm simply would not work.
U.S. Pat. No. 7,213,216 describes a .NET to Java translator that starts with “ . . . a first step of receiving metadata information from a .Net Remoting server on a Java client. Then, Java proxies are generated from said metadata information, using a Java development tool, with the Java proxies generated by a one-to-one mapping of .Net classes to Java classes.” This basically means that the .NET code is not actually being translated, but rather Java classes are being generated that perform the same functions as the .NET classes—metadata is information about a program rather than the source code itself, so what this essentially does is recognize that a given method is tagged with the “file input” tag (for example) and output a Java file input method rather than manipulating the .NET code. In addition, it maps .NET classes to Java classes, meaning that both languages are object-oriented (like Java) rather than .NET being procedural (like C). Thus, this algorithm could not be applied to C to Java translation.
Even syntactically similar languages such as C and Java have differences that make simple search-and-replace difficult. For example, while the C “char” arrays have an analog in Java Strings, because they are two different data structures the methods for accessing them are very different, and this discrepancy must be taken into account. A related difficulty is C's use of pointers. A “string” in C is not simply an array of “chars”, it is a pointer to an array of “chars”, expressed as “char*”, which means that string comparison methods, string search methods, and the like are required. One cannot simply copy, compare, or otherwise manipulate strings in the same way one may manipulate “ints” or “chars”.
It would be advantageous to have a translator for converting C programming language to Java without requiring human intervention, that translates literally to preserve both procedure and function in the resulting code, and that is independent of the purpose of the source code.