1. Technical Field
The invention relates to an apparatus and a method for generating an optimization object for providing an execution program module by linking necessary routines etc. to an object program generated by compiling a source program made up of a number of source files and, more specifically, to an apparatus and a method for generating an optimization object which can generate an execution program module in which instruction strings are so located as to be suitable for execution of a computer""s cache functions and predictive branch functions.
2. Background Art
In recent computer systems, to speedily execute instructions of a database processing program etc., future execution is predicted before actual execution. For example, a computer speed-up apparatus utilizing such prediction may come in:
I. a translation lookaside buffer (TLB) used for speedily referencing a page table provided to correlate with each other memory""s physical locations and their logical locations observable from the program,
II. a cache apparatus for using both time-wise and space-wise localities of a memory accessed by instructions to thereby hold necessary data in prediction of future memory accesses, or
III. a branch prediction apparatus for predicting whether a branch instruction branches or not before it is determined, to previously execute the instruction speculatively according to the results of the prediction.
The above-mentioned database has a large memory region of for example 8 GB and its source program is comprised of an aggregate of roughly 1000 through 10000 source files, each typically having more than 100 to less than 10,000 instruction strings, so that if each file is supposed to have an average of 1000 instructions, the program as a whole comprises a vast number of instruction strings of as many as 100 thousand to 10 millions. Whether an execution speed-up apparatus would properly function for such a large sized program depends largely on the instruction strings, or their execution scheme, of the application program as an execution program module generated by linking necessary routines to the object program. For example, an instruction cache apparatus can be reserved for reducing an average memory access time for fetching necessary instructions required for execution thereof. When the instructions are executed consecutively without executing a branch instruction, those instructions are fetched one after another, so that by expanding a cache block in units of which they are fetched at a time by the instruction cache apparatus from the memory, the number of time-consuming memory accesses can be decreased to thereby speed up the execution.
An actual application program, however, contains conditional branch instructions, unconditional branch instructions, and subroutine calling instructions and so its execution efficiency is largely deteriorated because:
I. the instructions are not executed smoothly in that some of the instructions that are not executed in a block especially fetched from the memory, are arranged in the cache memory, thus decreasing its utilization ratio; and
II. when a branch instruction is executed or a subroutine is called, control may be transferred to an instruction string described in a different file, so that instruction strings actually running in the execution program module may be discontinuous; to frequently update a TLB for the instructions provided for speedily referencing an instruction page table for use in correlating physical locations in the memory and logical locations observable from the program with each other.
Also, there are similar problems in data access; if data accessed and data not accessed are mixed in a certain execution environment or if data pieces accessed proximately time-wise are arranged at space-wise separate locations from each other, the execution speed is largely deteriorated because a data TLB is frequently updated which is used to speedily reference a logical address/physical address translation table for data addresses for use in correlating the physical locations in the memory and the logical locations observable from the program. Those situations can be improved by increasing the capacity of the cache memory capacity and the data TLB However, this is difficult to carry out in terms of cost and space because it increases the hardware size required.
Further, a branch prediction apparatus actually involves:
I. dynamic branch prediction for recording a hysteresis of whether conditions for a conditional branch instruction have been satisfied hitherto to thereby predict its future branch/non-branch;
II. explicit static branch prediction for recording beforehand branch/non-branch information of an instruction in the instruction itself when it is described to use it later when it is executed; and
III. implicit static branch prediction for observing a location relationship between a branch destination and a branch source to thereby decide branch/non-branch when a relevant instruction is executed; which are combined and used actually. Dynamic prediction by the branch prediction apparatus uses a hysteresis about how a branch instruction has been executed hitherto in prediction of its future. Therefore, if a relevant program has a large loop configuration such that once an instruction is executed, a large number of other instructions are executed until that instruction is executed next time, all of the hystereses, which are necessary for hysteresis recording, of executed branch instructions cannot be recorded due to limitations on the hardware size, so that some of the past hystereses may not be utilized in many cases. Even in such a case where past hystereses cannot be utilized, static branch prediction can be applied. By implicit static branch prediction used often, a branch is often predicted according to an experience rule that conditional branch instructions of a loop typically xe2x80x9chave often a skip destination instruction with a lower numbered address (i.e., backward branch) and hardly have a skip destination instruction with a higher numbered address (i.e., forward branch)xe2x80x9d. This experience rule, however, does not always work sufficiently.
Thus, in order to effectively operate the speed-up mechanism of hardware for executing instructions, when an object program or an execution program module is generated, an execution state is guessed or a thus generated program is executed experimentally. This is done to thereby decide instruction strings highly likely to be executed based on a record etc. of the execution state, or discriminate between execution portions and unexecution portions in a certain operational environment or between portions likely to be executed frequently and portions not likely to be done so. By dividing such a program into execution portions, unexecution portions, and frequently used portions and thus controlling it, the updating frequency of the TLB can be reduced, and also the location relationship between branch sources and branch destinations can be adjusted so that the implicit static branch prediction can function properly.
However, a source program made up as an aggregate of a large number of source files is compiled by the compiler processing each of its files, subroutines, functions, etc. as one unit to thereby generate an object program. Therefore, each compilation unit for the object file, object subroutine, and object function is divided into execution portions, unexecution portions, and frequently used portions, so that for example a plurality of files, i.e. compilation units, has not been processed so much. Accordingly, in order to concentrate execution instruction strings over a plurality of compilation units or optimize a system as a whole by adjusting those in front of and behind those strings, a vast source program comprised of a large number of source files must be compiled in a batch by a compiler newly developed, which takes a long time in compilation and is problematic.
Also, in order to generate an execution program module (application program) from an object program with a link processing program known as Linker or Assembler, for a unit of compilation, for example, each object file, module files of the execution program module are generated. As a result, a plurality of object files are not often processed at a time. Therefore, in order to concentrate execution instruction strings over a plurality of object files or adjust those in front of and behind those strings, it is necessary to create a link processing program for dedicated use in batch processing of a plurality of object files, thus taking a considerably long time in link processing.
In view of the above, it is an object of the invention to provide an optimization object generating apparatus and a method for appropriately concentrating instruction strings or data pieces which are sporadically present over a plurality of regions having a plurality of compilation units, without changing the compilation program unit such as a file, a subroutine, and a function, and also without creating a link processing program required in batch processing of the system as a whole.
(Optimization by Giving Section Name)
An optimization object generating apparatus according to the invention features a newly provided section name giving unit to the existing compiler, simulator, and linker. The compiler compiles a source program made up of a plurality of files to generate an object program in a predetermined compilation unit of a file, subroutine, function, etc. The simulator executes the object program thus generated by the compiler in a specific execution environment to generate execution information indicating executed instructions and unexecuted instructions. The section name giving unit featured by the invention uses the execution information thus generated by the simulator to, based thereon, separate a plurality of files of the object program into sections outputting executed instruction strings and sections outputting unexecuted instruction strings and give different section names to them. When generating an execution program from the object program by linking, a link processing unit (link processing program), which is the linker or the assembler, collects the sections with the execution section names and the sections with the unexecution section names of the plurality of files, and separates them into an execution portion and an unexecution portion. Thus, the invention utilizes the xe2x80x9cfunction of aggregating sections with the same namexe2x80x9d of the existing link processing unit for generating an execution program from an object program to collect the instructions sporadically present in a plurality of regions over more than one compilation unit and divide them into an instruction portion likely to be executed and an instruction portion unlikely to be executed, without changing the program compilation unit such as a file, a subroutine, or a function, and also without creating a link processing program required to process the whole system in a batch. In the case of a database processing program, for example, even if it has a large size of 5 MB, its instruction portion likely to be executed has a small size of about 200 KB. By collecting the execution instruction portion at one position, it is possible to largely reduce the frequency that the TLB for the instructions is updated in the cache apparatus and speed up the execution of the program as compared to the case where the 200 KB execution instruction portion is scattered over the whole size of 5 MB.
By the optimization object generating apparatus according to the invention, an object program generated by the compiler is executed in a specific execution environment, so that the execution frequency information of thus executed instructions is generated by a simulator, based on which information the section name giving unit gives section names to sections outputting the executed instruction strings of a plurality of files in the object program according to the execution frequency, after which finally the link processing unit, when generating an execution program module from the object program by linking, collects the sections having the same section name to one position, thus gathering the instruction strings with a high execution frequency into a small aggregate. In this case, the xe2x80x9cfunction of aggregating sections with the same namexe2x80x9d of the link processing unit is utilized to collect into a small aggregate the high execution frequency instruction strings sporadically present in a plurality of regions over more than one compilation unit without changing the program compilation unit such as a file, a subroutine, or a function. This is also done without creating a link processing program for batch processing of the system as a whole, thus largely reducing the updating frequency of the TLB for the instructions and speeding up program execution.
Also, by the optimization object generating apparatus according to the invention, an object program generated by the compiler is executed by a simulator in a specific execution environment to generate execution time information of thus executed instructions. Based on this information, the section name giving unit gives the same section name to sections outputting time-wise proximately executed instruction strings of a plurality of files of the object program After this the link processing unit, when generating an execution program module from the object program by linking, collects the sections with the same section names into one aggregate, thus placing proximately time-wise the instruction strings executed proximately space-wise over a plurality of files. In this case, the xe2x80x9cfunction of aggregating sections with the same namexe2x80x9d of the link processing unit is utilized to arrange the time-wise proximately executed instruction strings sporadically present in a plurality of regions over more than one compilation unit in a memory space proximately with each other without changing the program compilation unit such as a file, subroutine, or function and also without creating a link processing program for batch processing of the system as a whole, thus largely reducing the updating frequency of the TLB for the instructions and speeding up program execution.
Also, by the optimization object generating apparatus according to the invention, an object program generated by the compiler is executed by a simulator in a specific execution environment to generate the execution frequency information of thus executed instructions. The section name giving unit gives, based on the execution frequency information from the simulator, different section names to sections outputting branch destination instruction strings having a higher execution frequency and sections outputting branch destination instructions having a lower execution frequency so as to correctly conduct static branch prediction for deciding branching or non-branching predictably by observing a location relationship between branch destinations and branch sources. Finally, when generating an execution program module from the object program by linking, the link processing unit arranges based on the section names the above-mentioned branch destination instruction strings in a descending order of the execution frequency so as to conduct static branch prediction correctly. In this case, the xe2x80x9cfunction of aggregating sections with the same namexe2x80x9d of the link processing unit is utilized to arrange the front-and-rear relationship of the sporadically present branch instructions so as to bring about right static branch prediction without changing the program compilation unit such as a file, subroutine, or function. The link processing unit is also processed without creating a link processing program for batch processing of the system as a whole, thus increasing the probability of the successful static branch prediction to thereby speed up program execution.
Also, the optimization object generating apparatus may be provided with all functions of execution/unexecution discrimination based on giving of section names, execution frequency, translation of time-wise approximate execution to space-wise approximate execution, and front-and-rear relationship adjustment for successful static branch prediction. That is, in this case, the optimization object generating apparatus comprises a compiler which compiles a source program made up of a plurality of files to generate an object program in a predetermined compilation unit.
A simulator executes the object program generated by the compiler in a specific execution environment to thereby generate execution information indicating executed instructions and unexecuted instructions, frequency information of the executed instructions, and execution time information of the executed instructions.
A first section name giving unit gives, section names to the sections outputting the executed instruction strings and the sections outputting unexecuted instructions strings of a plurality of files in the object program based on the execution information from the simulator.
A second section name giving unit gives section names to the sections outputting the executed instruction strings of the plurality of object program according to the execution frequency based on the execution frequency information from the simulator.
A third section name giving unit gives the same section name to the sections outputting the instruction string executed proximately time-wise of the plurality of files of the object program based on the execution time information from the simulator.
A fourth section name giving unit gives different section names to the sections outputting branch destination instruction strings having a higher execution frequency and the sections outputting branch destination instruction strings having a lower execution frequency based on the execution frequency information from the simulator, so as to bring about right static branch prediction for deciding branching or non-branching predictably by observing a location relationship between branch destinations and branch sources when a conditional branch instruction is executed.
A link processing unit, collecting the sections having the same section name in the plurality of file into one aggregate to thereby arrange them in an execution portion and an unexecution portion separately when generating an execution program module from the object program by linking, so as to collect the instruction strings having a higher execution frequency into a small range in order to translate the time-wise proximately executed instruction strings into space-wise proximately executed instruction strings over a plurality of files, thus further placing the above-mentioned branch destination instruction strings in a descending order of the execution frequency for correct static branch prediction.
The optimization object generating apparatus also includes modifications of an arbitrary combination of the functions provided by giving of the section names such as:
I. execution/unexecution discrimination;
II. execution frequency;
III. translation of time-wise approximate execution to space-wise approximate execution; and
IV. adjustment of front-and-rear relationship for successful static branch prediction.
(Optimization by Giving of External Reference Names)
Another embodiment of the invention utilizes the xe2x80x9cfunction of controlling the jointing order in the modules according to the external reference name listxe2x80x9d of the existing link processing unit to thereby optimize the system without changing the compilation unit of the program such as a file, subroutine, or function. This is also done without creating a link processing program for batch processing of the system as a whole. To this end, the optimization object generating apparatus according to the invention comprises a compiler, a simulator, and a link processing unit and further an external reference name giving unit which is a feature of the invention. In this case, an object program generated by the compiler is executed by the simulator in a specific execution environment to thereby generate execution information indicating executed instructions and unexecuted instructions. Based on this execution information from the simulator, the external reference name giving unit divides a plurality of files in the object program into sections outputting the executed instruction strings and sections outputting unexecuted instruction strings to give those sections different external reference names. At the same time, an external reference name list is created that separates from each other the external reference names of the sections to which the executed instruction strings belong and the external reference names of the sections to which the unexecuted instruction strings belong. Finally, based on this external reference name list, the link processing unit collects, when generating an execution program module from the object program by linking, the executed sections and the unexecuted sections in the plurality of files to respective aggregates and separate them into execution portions and unexecution portions. Thus, the invention can utilize the xe2x80x9cfunction of controlling the jointing order in the modules according to the external reference name listxe2x80x9d of the existing link processing unit which generates the execution program module from the object program to thereby collect instructions sporadically present in a plurality of regions over more than one compilation unit into an instruction portion likely to be executed and an instruction portion unlikely to be executed. This is done without changing the program compilation unit such as a file, subroutine, or function and also without creating a link processing program for batch processing of the system as a whole. By thus aggregating the executed instructions, the frequency can be largely reduced of updating the instruction TLB in the cache memory apparatus to speed up program execution.
By the optimization object generating apparatus according to the invention also, an object program generated by the compiler is executed by the simulator in a specific execution environment to generate execution frequency information of the executed instructions. Based on this execution frequency information from the simulator, the external reference name giving unit gives sections outputting the executed instruction strings of a plurality of files in the object program external reference names according to the execution frequencies and, at the same time, creates an external reference name list in which they are sorted in a descending order of the execution frequency. Finally, based on the external reference name list, the link processing unit collects, when generating an execution program module from the object program by linking, the executed sections in the plurality of files to thereby arrange the instruction strings with higher execution frequencies in a small range. In this case, the xe2x80x9cfunction of controlling the jointing order in the modules according to the external reference name listxe2x80x9d of the link processing unit can be utilized to collect the instruction strings with higher execution frequencies sporadically present in a plurality of regions over more than one compilation unit into a small range. This is done without changing the program compilation unit such as a file, subroutine, or function and also without creating a link processing program for batch processing of the system as a whole, thus largely reducing the frequency of updating the instruction TLB and speeding up program execution.
Also, the optimization object generating apparatus according to the invention executes an object program generated by the compiler in a specific execution environment to create execution time information of the executed instructions. Based on the execution time information from the simulator, the external reference name giving unit gives the same external reference name to sections outputting instruction strings executed proximately time-wise of a plurality of files in the object program. At the same time, an external reference name list is created that lists the external reference names of the sections outputting the time-wise proximately executed instruction strings. Based on the external reference name list, the link processing unit, when generating an execution program module from the object program by linking, translates in arrangement the instruction strings executed proximately time-wise over a plurality of files into those approximate space-wise. In this case, the xe2x80x9cfunction of controlling the jointing order in the modules according to the external reference name listxe2x80x9d of the link processing unit to translate in arrangement the time-wise proximately executed instruction strings sporadically present in a plurality of regions over more than one compilation unit into those approximate space-wise in the memory space without changing the program compilation unit such as a file, subroutine, or function and also without creating a link processing program for batch processing of the system as a whole, thus largely reducing the frequency of updating the instruction TLB and speeding up program execution.
Also, by the optimization object generating apparatus according to the invention, an object program generated by the compiler is executed by the simulator in a specific execution environment to thereby generate execution frequency information of the execution instructions. Based on the execution frequency information from the simulator, the external reference name giving unit gives different external reference names to sections outputting branch-destination instruction strings having a higher execution frequency and sections outputting branch-destination strings having a lower execution frequency for successful static branch prediction for deciding branching or non-branching predictably by observing the location relationship between branch destinations and branch sources when conditional branch instructions are executed At the same time, an external reference name list that lists the external reference names is generated in a descending order of the execution frequency. Based on the external reference name list, the link processing unit arranges, when generating an execution program module from the object program by linking, the branch-destination instruction strings in a descending order of the execution frequency so that the static branch prediction come true. In this case, the xe2x80x9cfunction of controlling the jointing order in the modules according to the external reference name listxe2x80x9d of the link processing unit can be utilized to adjust the front-and-rear relationship of the branch-destination instructions sporadically present for successful static branch prediction without changing the program compilation unit such as a file, subroutine, or function. This is also done without creating a link processing program for batch processing of the system as a whole, to enhance the probability of successful static branch prediction, thus speeding up program execution.
Also, the optimization object generating apparatus may be provided with such functions as execution/unexecution discrimination by giving of external reference names, execution frequency, translation of time-wise approximate execution into space-wise approximate execution, and adjustment of front-and-rear relationship for successful static branch prediction. That is, in this case the optimization object generating apparatus comprises a compiler for generating in a predetermined compilation unit an object program from a source program made up of a plurality of files.
A simulator executes the above-mentioned object program generated by the compiler in a specific execution environment to generate execution information indicating executed instructions and unexecuted instructions, frequency information of the executed instructions, and execution time information of the executed instructions.
A first reference name giving unit divides, based on the execution information from the simulator, a plurality of files in the object program into sections outputting the executed instruction strings and sections outputting the unexecuted instruction strings to thereby give them different external reference names and, at the same time, generate an external reference name list that separates from each other the external reference names of the sections to which the executed instruction belong and the external reference names of the sections to which the unexecuted instruction strings belong.
A second external reference name giving unit gives, based on the execution frequency information from the simulator, external reference names to the sections outputting the executed instruction strings of the plurality of files in the object program according to the execution frequency and, at the same time, generates an external reference name list that sorts them in a descending order of the execution frequency
A third external reference name giving unit gives, based on the execution time information from the simulator, the same external reference name to the sections outputting the instruction strings executed proximately time-wise of the plurality of files in the object program and, at the same time, generates an external reference name list that lists the external reference names of the sections outputting the instruction strings executed proximately time-wise.
A fourth external reference name giving unit gives, based on the execution frequency information from the simulator, different external reference names to sections outputting branch-destination instruction strings having a higher execution frequency and sections outputting branch-destination instruction strings having a lower execution frequency for successful static branch prediction for deciding branching or non-branching by observing the location relationship between the branch destinations and the branch sources when conditional branch instructions are executed. At the same time, an external reference name list is generated, which lists them in a descending order of the execution frequencies.
A link processing unit, when generating an execution program module from the object program by linking, collects based on the external reference name list the executed sections and the unexecuted sections in the plurality of files into respective aggregates to separate them into execution portions and unexecution portions. The executed sections in the plurality of files are gathered into one aggregate in order to arrange the instruction strings with a higher execution frequency into a small range, thus translating the instruction strings executed approximate time-wise over the plurality of files into those approximate space-wise and also arranging the above-mentioned branch-destination instruction strings in a descending order of the execution frequency for successful static branch prediction.
(Optimization of Data Arrangement)
The optimization object generating apparatus according to the invention utilizes the xe2x80x9cfunction of aggregating sections with the same namexe2x80x9d of the existing link processing unit for generating an execution program module from an object program to thereby divide data sporadically present in a plurality of regions over more than one compilation unit into data pieces likely to be referenced and data unlikely to be referenced during execution. This is done without changing the program compilation unit such as a file, subroutine, or function, and also without creating a link processing program for batch processing of the system as a whole.
To this end, the optimization object generating apparatus permits the simulator to execute the object program generated by the compiler in a specific execution environment to thereby generate data reference information indicating data referenced and data not referenced during execution. Based on the data reference information from the simulator, the section name giving unit gives different section names to the data referenced and the data not referenced during execution of a plurality of files in the object program. The link processing unit, when generating an execution program module from the object program by linking, collects the data having the same section name into one aggregate to thereby adjacently arrange the referenced data and the unreferenced data separately from each other. In this case, the xe2x80x9cfunction of aggregating sections with the same namexe2x80x9d of the existing link processing unit for generating an execution program from the object program cam be utilized to divide data sporadically present in a plurality of regions over more than one compilation unit into data likely to be referenced and data unlikely to be referenced without changing the program compilation unit such as a file, subroutine, or function. This is also done without creating a link processing program for batch processing of the system as a whole, thus largely reducing the frequency of updating the data TLB to speed up program execution.
Also, the optimization object generating apparatus according to the invention comprises a simulator for executing an object program generated by the compiler under a specific execution environment to generate reference time information of the referenced data. A section name giving unit, based on the reference time information from the simulator, gives the same section name to sections outputting the data referenced proximately time-wise of a plurality of files in the object program. A link processing unit, when generating an execution program module from the object program by linking, collects and arranges the time-wise proximately referenced data pieces so that they may be approximate with each other space-wise. In this case, the xe2x80x9cfunction of aggregating sections with the same namexe2x80x9d of the link processing unit can be utilized to arrange the time-wise proximately referenced data pieces sporadically present in a plurality of regions over more than one compilation unit, so that they may be approximate with each other space-wise without changing the program compilation unit such as a file, subroutine, or function and also without creating a link processing program for batch processing of the system as a whole, thus largely reducing the frequency of updating the TLB to speed up program execution.
Also, the optimization object generating apparatus includes modifications by an arbitrary combination provided by giving of external reference names, such as:
I. separating execution and unexecution from each other;
II. classification based on execution frequency;
III. translation of time-wise approximate execution to space-wise approximate execution; and
IV. adjustment of front-and-rear relationship for successful static branch prediction.
(Method of Optimization by Giving of Section Names)
The invention provides also a method for generating an optimization object and, when execution and unexecution are discriminated from each other in giving of section names, comprises performing a compilation by generating an object program in a predetermined compilation unit from a source program made up of a plurality of files.
A simulation is performed by executing the object program generated by the compilation in a specific execution environment to thereby generate execution information indicating executed instructions and unexecuted instructions.
A section name giving is performed, based on the execution information from the simulation, by dividing a plurality of files in the object program into sections outputting executed instruction strings and sections outputting unexecuted instruction strings to thereby give them different sections names.
A link processing is performed, when generating an execution program module from the object program by linking, by collecting the sections having the executed section names and the section having the unexecuted section names in the plurality of files into one aggregates respectively to thereby discriminate between execution portions and unexecution portions.
The sections are arranged in a descending order of the execution frequency in giving of the section names, which comprises performing a simulation by executing the object program generated by the compilation in a specific execution environment to thereby generate execution frequency information of the executed instructions.
A section name giving is performed based on the execution frequency information from the simulation step, by giving section names of the sections outputting executed instruction strings of the plurality of files in the object program according to the execution frequencies.
A link processing is performed, when generating an execution program module from the object program by linking, by collecting the sections having the same section name into one aggregate to thereby gather the instruction strings having a higher execution frequency into a small range.
Also, when the time-wise approximate instructions are to be re-arranged so that they may be approximate space-wise in giving of the section names, it comprises performing a compilation by generating an object program in a predetermined compilation unit from a source program made up of a plurality of files.
A simulation is performed by executing the above-mentioned object program generated by the compilation in a specific execution environment to thereby generate execution time information of the executed instructions.
A section name giving is performed, based on the execution time information from the simulation, by giving the same section name to sections outputting instruction strings executed proximately time-wise of the plurality of files in the object program.
A link processing is performed when generating an execution program module from the object program by linking, by collecting the sections having the same section name into one aggregate to thereby arrange the instruction strings executed proximately time-wise over a plurality of files so that they may be approximate with each other space-wise.
Also, when optimization of arrangement is made based on static branch prediction, it comprises performing a compilation by generating an object program in a predetermined compilation unit from a source program made up of a plurality of files.
A simulation is performed by executing the object program generated by the compilation in a specific execution environment to thereby generate execution frequency information of the executed instructions.
A section name giving is performed, based on the execution frequency information from the simulation, by giving different section names to sections outputting branch-destination instruction strings having a higher execution frequency and sections outputting branch-destination instruction strings having a lower execution frequency for successful static branch prediction for deciding branching or non-branching predictably by observing the location relationship between branch destinations and branch sources when conditional branch instructions are executed.
A link processing is performed, when generating an execution program module from the object program, by arranging the above-mentioned branch-destination instruction strings in a descending order of the execution frequency based on the section names for successful static branch prediction.
Also, when all of a plurality of optimization functions are combined, it comprises performing a compilation by generating an object program in a predetermined compilation unit from a source program made up of a plurality of files.
A simulation is performed by executing the above-mentioned object program generated by the compilation in a specific execution environment to thereby generate execution information indicating executed instructions and unexecuted instructions, frequency information of executed instructions, and execution time information of the executed instructions.
A first section name giving is performed, based on the execution information from the simulation by giving, section names to sections outputting executed instructions and sections outputting unexecuted instructions of a plurality of files in the object program.
A second section name giving is performed based on the execution frequency information from the simulation, by giving section names to the sections outputting the executed instruction strings of the plurality of files in the object program according to the execution frequencies.
A third section name giving is performed based on the execution time information from the simulation, by giving the same section names to the sections outputting the instruction strings executed proximately time-wise of the plurality of files in the object program.
A fourth section name giving is performed based on the execution frequency information from the simulation, by giving different section names to the sections outputting branch-destination instruction strings having a higher execution frequency and the sections outputting branch-destination instruction strings having a lower execution frequency for successful static branch prediction for deciding branching or non-branching predictably by observing the location relationship between branch destination and branch sources when conditional branch instructions are executed.
A link processing is performed when generating an execution program module from the object program, by collecting the sections having the same section name of the plurality of files into one aggregate respectively to thereby divide them into execution portions and unexecution portions. The instruction strings having a higher execution frequency are gathered into a small range in order to arrange the time-wise proximately executed instruction strings over a plurality of files so that they may be approximate with each other space-wise. The above-mentioned branch-destination instruction strings are arranged in a descending order of the execution frequency so that the above-mentioned static branch prediction may come true.
(Optimization Method Based on External Reference Names)
When execution and unexecution are discriminated from each other in giving of external reference names, it comprises performing a compilation by generating an object program in a predetermined compilation unit from a source program made up of a plurality of source files.
A simulation is performed by executing the object program generated by the compilation in a specific execution environment to thereby generate execution information indicating executed instructions and unexecuted instructions.
A reference name giving is performed, based on the execution information from the simulation, by dividing a plurality of files in the object program into sections outputting executed instructions and sections outputting unexecuted instructions to thereby give them different external reference names. At the same time, an external reference name list is generated, that separates from each other the external reference names of the sections to which the executed instructions belong and the external reference names of the sections to which the unexecuted instruction strings belong.
A link processing is performed, when generating an execution program module from the object program by linking, by collecting the executed sections and the unexecuted sections in the plurality of files into one aggregate respectively based on the external reference name list to thereby divide them into execution portions and unexecution portions.
Also, to arrange the external reference names in a descending order when giving them, the invention comprises performing a compilation by generating an object program in a predetermined compilation unit from a source program made up of a plurality of files.
A simulation is performed by executing the object program generated by the compilation in a specific execution environment to thereby generate execution frequency information of executed instructions.
An external reference name giving is performed, based on the execution frequency information from the simulation by giving external reference names of a plurality of files in the object program according to the execution frequencies of the sections outputting the executed instruction strings and, at the same time, generating an external reference name list that sorts the names in a descending order of the execution frequency.
A link processing is performed when generating an execution program module from the object program, by collecting the executed sections of the plurality of files into one aggregate to thereby gather the instruction strings with a higher execution frequency into a small range.
Also, to re-arrange the time-wise approximate instructions so that they may be approximate space-wise in giving of external reference names, the invention comprises performing a compilation by generating an object program in a predetermined compilation unit from a source program made up of a plurality of files.
A simulation is performed by executing the object program generated by the compilation in a specific execution environment to thereby generate execution time information of executed instructions.
An external reference name giving is performed based on the execution time information from the simulation, by giving the same external reference name to sections outputting instruction strings executed proximately time-wise of a plurality of files in the object program and, at the same time, generating an external reference name list that lists the external reference names of the sections outputting the instruction strings executed proximately time-wise.
A link processing is performed when generating an execution program module from the object program, by arranging the instruction strings executed proximately time-wise over a plurality of files so that they may be approximate space-wise based on the external reference name list.
Further, to arrange them based on static branch prediction by giving of external reference names, the invention comprises performing a compilation by generating an object program in a predetermined compilation unit from a source program made up of a plurality files.
A simulation is performed by executing the above-mentioned object program generated by the compilation in a specific execution environment to thereby generate execution frequency information of executed instructions.
An external reference name giving is performed based on the execution frequency information from the simulation, by giving different external reference names to sections outputting branch-destination instructions having a higher execution frequency and sections outputting branch-destination instructions having a lower execution frequency for successful static branch prediction for deciding branching and non-ranching by observing the location relationship between branch destinations and branch sources when conditional branch instructions are executed and, at the same time generating therein said external reference names in a descending order of execution frequency.
A link processing is performed, when generating an execution program module from the object program, by arranging the above-mentioned branch-destination instruction strings in a descending order of the execution frequency so that the above-mentioned static branch prediction may come true based on the above-mentioned external reference name list.
Further, when all of the optimization functions are combined by giving of external reference names, the invention comprises performing a compilation by generating an object program in a predetermined compilation unit from a source program made up of a plurality of files.
A simulation is performed by executing the object program generated by the compilation in a specific execution environment to thereby generate execution information indicating executed instructions and unexecuted instructions, frequency information of the executed instructions, and execution time information of the executed instructions.
A first reference name giving is performed based on the execution information from the simulation, by dividing a plurality of files in the object program into sections outputting executed instruction strings and sections outputting unexecuted instruction strings. At the same time, an external reference name list is generated that separates from each other the external reference names of the sections to which the executed instruction strings belong and the external reference names of the sections to which the unexecuted instruction strings belong.
A second external reference name giving is performed based on the execution frequency information from the simulation, by giving external reference names to the sections outputting the executed instruction strings according to the execution frequencies of the plurality of files in the object program. At the same time, an external reference name list is generated that sorts the names in a descending order of the execution frequency.
A third external reference name giving is performed based on the execution time information from the simulation, by giving the same external reference name to the sections outputting the instruction strings executed proximately time-wise. At the same time, an external reference name list is generated that lists the external reference names of the sections outputting the instruction strings executed proximately time-wise;
A fourth external reference name giving is performed based on the execution frequency information from the simulation, by giving different external reference names to the sections outputting branch-destination instruction strings having a higher execution frequency and the sections outputting branch-destination instruction strings having a lower execution frequency. At the same time, generating an external reference name list is generated that lists the names in a descending order of the execution frequency.
A link processing is performed, when generating an execution program module from the object program, by collecting the executed sections and the unexecuted sections in a plurality files into an aggregate respectively, based on the external reference name list, to thereby divide them into execution portions and unexecution portions and gather the higher-execution frequency into a small range in order to arrange the instruction strings executed proximately time-wise over a plurality of files so that they may be approximate from each other space-wise, and also arrange the above-mentioned branch-destination instruction strings in a descending order of the execution frequency so that static branch prediction may come true.
(Optimization Method by Discrimination Between Referenced Data and Unreferenced Data)
In this case, the optimization method comprises performing a compilation by generating an object program in a predetermined compilation unit from a source program made up of a plurality of source files.
A simulation is performed by executing the object program generated by the compilation in a specific execution environment to thereby generate data reference information indicating data referenced and data not referenced during execution.
A section name giving is performed based on the data reference information from the simulation, by giving different section names to data referenced and data not referenced during execution of a plurality of files in said object program.
A link processing is performed when generating an execution program module from the object program, by collecting the data having the same section names into one aggregate to thereby adjacently arranging the referenced data and the unreferenced data separately from each other space-wise.
(Optimization Method by Re-arrangement of Time-wise Approximate Data Into Space-wise Approximate Data)
In this case, the optimization object generating method comprises performing a compilation by generating an object program in a predetermined compilation unit from a source program made up of a plurality of files.
A simulation is performed by executing the object program generated by the compilation in a specific execution environment to thereby generate reference time information of referenced data.
A section name giving is performed based on the reference time information from the simulation by giving, the same section name to sections outputting the data referenced proximately time-wise of a plurality of files in the object program.
A link processing is performed, when generating an execution program module from the object program by linking, by collecting the sections having the same section name into one aggregate to thereby arrange the time-wise proximately referenced data so that it may be proximate space-wise.
The details of this optimization object generating method correspond to the configuration of the apparatus.