A computer can understand and interpret only machine language which is in binary form and thus very difficult to write. Assembler language is a symbolic programming language that allows the programmer to code instructions instead of coding directly in machine language. Because the assembler language allows the programmer to use meaningful symbols made up of alphabetic and numeric characters instead of merely the binary digits 0 and 1 used in machine language, it can make coding easier to read, understand and change.
FIG. 1 shows an overview of a computer system. A programmer enters code in assembler language at a terminal 20 which is the stored in a source file 30 in memory. When the source file 30 is completed it is processed by an assembler 40 which produces an object module 50 in machine language. The object code in the object module 50 is used as an input to another processing program, the linkage editor 60, which in turn produces a load module 70. The load module 70 can be loaded into main storage 80 of a computer, which then executes the program.
The assembler language is the symbolic programming language that lies closest to the machine language in form and content. It is made up of statements that represent instructions and/or comments. The instruction statements are the working part of the language and are divided into the following three groups:
1) Machine Instructions. These are symbolic representations of machine languages instructions of the processor instruction set.
2) Assembler Instructions. These are requests to the assembler program to perform certain operations during the assembly of a source module. Examples of such operations are defining data constants, defining the end of a source module or reserving main storage areas. Except for instructions that define constants, the assembler does not translate assembler instructions into object code.
3) Macro Instructions. These are requests to the assembler program to process a predefined sequence of code called a macro definition in a "pre-compile" step. The macro definition may contain variables and occurs once at the beginning of the source code. The macro must be invoked by a "macro call" which may set the value of macro variables following the definition. From the macro definition, the assembler generates machine and assembler instructions which it then processes as if they were part of the original input in the source module. Macro definitions can be programmed by the user or may be pre-programmed into the assembler.
A description of the IBM/370 Assembler Language is found in the IBM Manual GC33-4010, entitled "OS/VS-DOS/VSE-VM/370 Assembler Language" and in the book "Principles of Assembler Language Programming for the IBM 370" by Spotswood D. Stoddard, McGraw Hill, Inc., New York, 1985. The description of the/370 Assembler language from these publications is incorporated herein by reference.
When compiling computer code, the assembler uses a base register and usually one or more "work" registers. The base register in IBM/370 or IBM/390 architecture can address a range of 4096 bytes. Whilst this is generally sufficient for compilation purposes, it can at times become overloaded. On example of such overloading is when code allowing the diagnosis of software errors (called trace points) needs to be incorporated into the main program code. Incorporating the trace point descriptors occupies valuable computer program main storage in the code and in some cases it would not be possible to introduce the descriptors since the base register would not have room for the extra storage required.
One way of solving this problem is by changing the sequence of language statements produced by a language compiler compared to the actual sequence of code generation.
Changing the sequence of language statements can be done for example by causing compiled language code to appear earlier in a program. Such a technique would be used to overlay already existing code data with new data and uses the ORG Assembler Language Control statement. This statement can also be used when, for example, at the time that a main storage location is generated in a program, the actual data to be stored there is not known until some later time when the compiler program generation has proceeded to some new location--one can then cause the compiler to overlay the past location with the final value which is now known. An example of such a program is:
__________________________________________________________________________ LABEL1 DC F'0' PLACE HOLDER, TO BE OVERLAYED LATER LABEL2 EQU * LABEL2 LOCATION ORG LABEL1 CHANGE PRESENT ORIGIN TO LABEL1 DC F'5' LABEL1 NOW CONTAINS VALUE '5' ORG ' RETURN TO LABEL2 LOCATION __________________________________________________________________________
In this example, the first statement reserves space for a variable LABEL1. Only later in the program is the value of this variable known. Using the ORG and DC statements, one can overlay the reserved space with the value 5. This value is then used during execution for any statements which use the value of variable LABEL1.
The ORG statement also has the advantage that one main storage location can be referenced with a given label (label "a") at the time of language compiling, and later another label (label "b") can be generated and caused to overlay the same location as label "a", thereby causing the same main storage location to be able to be referenced by two different labels ("a" and "b").
Another example of changing the sequence of language statements is when one wishes to cause language code to appear later in a program. This technique uses the LTORG statement to enable the programmer to create and refer to needed data constants in the program that do not yet exist. The statement is used to generate data constants (referred to as "literals", see Stottard p.30) that one would have to generate at some accessible location to the program and create some label for each on one and collect the data constants (called the "literal pool") at some place in main storage (see Stottard p.43). As an example, without the LTORG statement one might program:
__________________________________________________________________________ L REG1,DATAX LOAD REGISTER 1 WITH ADDRESS OF LABELZ S REG1,DATAY SUBTRACT 8 FROM REGISTER 1 CONTENTS DATAX DC A(LABELZ) ADDRESS OF LABEL LABELZ DATAY DC F'08' DATA VALUE OF 8 __________________________________________________________________________
But using the LTORG statement one might program:
__________________________________________________________________________ L REG1,=A(LABELZ) LOAD REGISTER 1 WITH ADDRESS OF LABELZ S REG1,=F'08' SUBTRACT 8 FROM REGISTER 1 CONTENTS - - -. LTORG __________________________________________________________________________
The LTORG statement when compiled will effectively create the following statements without any effort by the programmer:
______________________________________ DC A(LABELZ) DC F'08' ______________________________________
There are several disadvantages with using the LTORG statement. Firstly, the final order of the data within the program can be different from the order intended by the programmer, since the compiler may try to optimize the computer main storage required (e.g. the compiler will juggle the data so that they fit nicely aligned into 4 or 8 byte blocks without wasting space). For example, one codes:
______________________________________ MVC LAB1,=CL3'ABC' L REG1,=A(LABELZ) MVI LAB2,=CL1'X' . . . . LTORG ______________________________________
but the LTORG statement when compiled will effectively create the following statements:
______________________________________ DC A(LABELZ) DC CL3'ABC' DC CL1'X' ______________________________________
From this example, it can be seen that the order of the data has been altered from that order in which the data appear within the program. This is a serious limitation to the generalized usage of the LTORG statement.
The second disadvantage to the LTORG statement is that the data constants can not be manipulated by the program, i.e. they are "read-only" (see Stottard, p.114). Finally the third disadvantage to the LTORG statement is that only special parts of the language can be used as a "literal", the assumption being that these parts are to be used to generate data definitions, e.g. one cannot move statements forward that contain the ORG statement.
In principle, the ORG statement might also be used to cause language code to appear forward in a program. However in practice this does not occur for several reasons. Firstly, it is difficult generally to implement, so its use is discouraged and secondly there are theoretical limitations to the implementation as shown by the following example:
__________________________________________________________________________ LABELA DS OH LABEL A ORG LABELB RESET LOCATION COUNTER TO LABEL B LABELC DS 2XL(LABELB-LABELA) DATA AREA, LENGTH DEPENDING ON THE DIFFERENCE BETWEEN LOCATIONS OF LABELS B AND A. LABELB DS OH LABEL B __________________________________________________________________________
Here the Language Compiler is unable to figure out what to do because it can not calculate a location of LABELB without knowing the length of LABELC whose length depends in turn on the location of LABELB.
An object of the present invention is to provide a method for compiling computer code which overcomes the above disadvantages.