1. Field of the Invention
This invention relates generally to the field of computer programming tools and more particularly to the field of tools for creating interfaces between programming languages.
2. Background
In many computer systems today, programs can be written in any of a number of languages but almost all computer systems have some form of assembler language. Most computers also have compilers that allow programmers to use what are called higher level languages. However, when new programs written in one language have to communicate with programs written in another language, it can be time consuming for the programmer to figure out what data must be communicated between the programs in the two different languages, how to communicate it, and in which format. This is especially so if there are large conversion projects, in which programs written in one language, such as assembler, are being converted gradually to another language such as C.
Assembler language uses the basic instruction set of the particular machine in a symbolic "shorthand" that usually includes the ability to make references to addresses and registers. A typical series of assembler instructions might look like:
______________________________________ label instruction operand(s) comments ______________________________________ * Comments * d0 is slot number * d4 is counter number * d4 is counter address handlecounter: move.1 d0,a3 add.w #4,d4 move.w sr,-(a7) move.w #$2600,sr make status error code for length rts ______________________________________
In this example, each line contains data in assembler language format. The lines starting with an asterisk (*) symbol indicate to the assembler program that will translate this into binary machine language that they are comment lines. The next 6 lines are actual code, which may have labels (such as "handlecounter") or include instructions such as "move.1". A move instruction such as move.l works on operands, here registers d0 and a3. Finally, line by line comments, such as "make status error code for length" on the next to last line are separated from the operands on that line by one or more blank spaces.
Higher level languages usually allow the programmer to solve the same problem as an assembler language program might, but usually with fewer lines of code. Where the hypothetical assembler language program above takes 6 lines of actual code, a programmer using a higher level language might be able to achieve the same results in one line, such as:
Incr address by 4, set status.sub.-- indicators=0.
In addition, higher level languages are also designed to be easier to read, understand and maintain. Compiler programs take the statements written in the higher level languages and turn them into machine language instructions in binary.
Historically, however, it had been true that where speed of execution was important, or where available memory for programs was limited, writing a program in assembler language was usually preferable to writing it in a higher level language. This was so because assembler language programs could be specifically written for optimal speed, or optimal memory use or both. Higher level language programs tended to result in machine code that was frequently not as fast in execution and often much larger in size than machine code resulting from an assembler language program. Consequently, large numbers of systems programs--those which directly control hardware in a computer system, such as the operating system or the file system or the microcode for controllers and disks systems--have been written in assembler language.
Now, however, as processor speeds have increased significantly, and costs of memory have gone done greatly, higher level languages such as C and C++ and others can be used for many systems programming functions and still provide acceptable speed. The C and C++ languages are particularly suited for this purpose, as they have some features that more closely resemble assembler languages. Since programs written in higher level languages are usually easier to write and maintain, many assembler language programs are now being converted to higher level languages. Writing in a higher level language also means the programs can be much more easily "ported" or re-compiled for other machines with different instruction sets.
Most systems programs written in assembler, such as operating systems or file systems, are usually made up of a large number of individual programs which are eventually linked together into a larger, executable form. As new programs written in C are added to incorporate a new feature such as a new device driver, they need to be compatible with the existing assembler programs. In the same way, as existing assembler programs are converted to C, they, too, need to be compatible with the other assembler programs still in the file system. To be compatible, the C programs usually need to be able to receive data from the existing assembler programs and send data back to them. Such information items that are sent between programs are often called parameters or arguments.
When large or complex production or product systems are undergoing conversions such as this it is usually not practical or desirable to convert all the code at once. Instead, it is preferable to convert the assembler code to the higher level language piece by piece. This keeps the mixed system alive, usable and testable.
Presently, compatibility between programs in two different languages is usually achieved manually by the programmers involved, in order to overcome some of the confusion caused by different conventions in parameter passing between the languages. When a C program written for the Motorola 68000 CPU, for example, calls another program, it is usually done as a function call and parameters are passed on what is known as the stack--an area of memory pointed to by a stack pointer register. The stack contains a last-in-first-out list of items. In the C programming language, the called function usually returns a value to the calling program in a register. Most compilers for a given machine will produce code that returns the value from a function call in the same register. For the Motorola 68000 processor, for example, C compilers will put the return value in register d0, for data, or register a0 for pointers. However, assembler programs may pass parameters in registers and not on the stack, or with a return value in a different register from that which is expected by a C program.
For example, an assembler program that calculates a weekly pay amount might expect to receive as inputs a time value representing the number of hours worked, and a rate value, indicating the hourly pay rate. The output from the program would be the weekly pay. A typical assembler program might be written to receive the time value in register d0, the rate value in register d1, and might store the result in register d7. As can be seen in FIG. 2, when the calling C program is compiled, the compiler may have allocated the integers time and rate to stack offsets 04 and 08, with values 0008 and 0010 in decimal. The register contents for registers d0-d7 do not contain these values. The stack pointer in register a7 in this example, points to the values. If this C program calls the assembler program shown in FIG. 2, the assembler program will expect time and rate to be inputs, but it expects the values for those inputs to be in registers d0 and d1, respectively, not at offsets from the stack. Since the computed result from the assembler program is in register d7 it is already using a convention of register usage that differs from that used by many C compilers. If this C program must call this assembler program, the C programmer needs to know what the assembler program expects as inputs and how it returns outputs so that the C programmer can write the function calls and define the variables appropriately.
To do this, the C programmer today would look at the source code and documentation for the assembler program, and check to see how it expects to get and return these parameters. Usually the programmer is helped by the documentation appearing in the source code in the form of comments. For large conversion projects, in which assembler programs that are part of a large system are being converted to C, it can be time-consuming to do this manually, taking anywhere from several minutes to several hours or even several days, depending on the number of programs to be converted at any given time.
In addition, because the new program is being written in C, it is likely that the C compiler will create variables in a format that is different from that expected by one or more of the assembler programs being called, as mentioned in the example of the return value, above. When this occurs, the programmer often has to write what is called a "wrapper" which translates the variables from the C format to the format expected by the assembler program for input and from the assembler program format to C format for output. Most typically a wrapper would be several lines of assembler code that take the variables in C format and transform them to the format expected by the assembler program. It would also include a call or branch to the assembler program and then, upon the return from the assembler program, a manipulation of the registers to put them in the state expected by the C program, and finally a return to the C program. Thus the original assembler code is "wrapped" in more assembler code that converts the parameter formats so the assembler program can be called by a C program.
The more assembler programming of wrappers the C programmer has to do, the more chances there are for errors to occur in minor details, such as computing offsets from a stack to save and restore registers or values or writing the assembler punctuation and syntax incorrectly. In larger conversion projects, the task of writing wrappers for tens or hundreds of assembler programs also tends to be repetitive and usually not the most effective use of a programmer's time.
It is an object of this invention to reduce the amount of programmer time required to achieve this type of interface by automating more of the process of identifying and creating the interface between a program written in a higher level language and one written in assembler.
It is another object of the present invention to decrease the likelihood of errors in the creation of an interface for a program being called by a higher level language.