1. Field of the Invention
The present invention relates to an optimization system, an optimization method, and an computer-readable compiler program for implementing the method of performing optimization processing for a program described in a dynamic language so as to increase the access speed of methods or variables (symbols) frequently accessed during execution of the program, while guaranteeing the program to operate in the same manner as before the optimization.
2. Description of Related Art
Programming languages are used in order to control machines or computers. The programming languages are categorized as static or dynamic languages. A static language, such as C, is a procedural language and is designed as a compiler language. Other examples of static languages are C++ which was developed as an extension of the C, and Java (registered trademark). Dynamic languages include Ruby (registered trademark), Python (registered trademark) and JavaScript (registered trademark).
A program in a static language is complied by checking the types of all the variables therein. The compiling is aborted as an error if a conflict in type is found in any of the variables. The compiling is successfully completed if no conflict is found in all the variables. After compiling, one of the executable forms is executed. As part of execution, a program in an executable form is compiled and converted into a machine language (binary) which is compatible with the computer using the program. The machine language is a language directly interpretable and executable by a CPU. In the static language, data to be processed needs to be declared, meaning the data must be classified into types such as numeric, text, and composite types. If a text type (string) data is used as a numeric type without declaration, an error results at the time of compiling.
In contrast, the dynamic language is a language which allows a method or a variable (a symbol) to be dynamically deleted or added from or to a class at the time of execution. Since data types do not need to be declared in a program in the dynamic language, the program is not checked in terms of data types and therefore no conflict in type errors are triggered before execution. As described above, the dynamic language does not require a specific type to be set for data to be processed, and thus can more flexibly respond to changes than the static language. Moreover, the dynamic language has advantages over the static language in that the code in the dynamic language is smaller in volume and easier to read. From these, the dynamic language exhibits high development efficiency and thus has been used for large-scale systems in recent years.
As a programming technique for description in such a programming language, object-oriented programming have been increasingly employed in which data and procedures (methods) for operating the data are set as units called objects and a program is described by combining the objects. During the execution of the program described by this object-oriented programming, accesses are frequently made to methods configuring objects and variables (symbols) assigned such unique names that data can be stored and made available for a certain period.
In recent years, increase in processing speed by executing the program largely depends on increase in the speed of acquisition of values stored in the symbols. Thus, the acquisition is desired to be speeded up. However, wide use of the dynamic language slower than the static language prevents the speeding-up of the acquisition. This is because the static language allows the acquisition by use of a single memory access instruction, while the dynamic language in major implementation forms requires hash table access for the acquisition, and cannot avoid increase in memory accesses.
In the static language, a symbol name is converted into an offset value on a memory (offset_x) at the time of compiling or first execution of a program. Accordingly, in the static language, a program as shown in FIG. 1(a) is described in the code shown in FIG. 1(b). The code includes only an instruction to: read 64 bits from an address obtained by adding a constant offset_x to the content of the register r10; and then to assign the 64 bits to r31. Here, the 64 bits means that int (integer type) has a 64 bit value. As described above, the static language is capable of getting a value stored in the symbol by using only one instruction. Note that the 64 bits are read in the above description, but 32 bits can be read and assigned in a case of a variable indicating a 32-bit value.
In the dynamic language, symbol names as “key” and their values as “value” are stored in a symbol table (an open hash table, for example), and the symbol table is used for accessing each symbol. Accordingly, in the dynamic language, the program shown in FIG. 1(a) is described in the code as shown in FIG. 1(c). The code includes: an instruction to assign an integer value indicating a symbol x to r3; and an instruction to call a method of get_symbol( ) to get a value stored in the symbol in accordance with the r3, and then to assign the value to the r31. In the dynamic language, the value stored in the symbol is gotten by executing multiple instructions as described above. For this reason, memory accesses are increased and accordingly the acquisition of the value stored in the symbol needs longer time than in the static language. Note that ld and li in FIGS. 1B and 1C are assemblers for PowerPC, respectively.
For such hash table access, some techniques of speeding up access to symbols have been proposed. For example in LuaJIT User's Group post of November 2009, M. Pall (herein “Pall”) discloses speeding-up is achieved in the case of hash slot specialization using an HREFK instruction in the following manner: if a hash key does not match “key,” processing is terminated; if the hash key matches “key,” a value corresponding to the hash key is loaded, or the value is stored.
In “A Practical Solution for Scripting Language Compilers” by Biggar el al. (herein “Biggar”) speeding-up is achieved by calculating a hash value at the time of compiling rather than at the time of execution. In addition, if it is statically ensured that a specific function does not generate code at the time of execution, a symbol table is deleted, and a value is directly accessed in the generated code.
In Japanese Patent Application Publication No. 2002-132521, a program loader decides allocated addresses of symbols in a re-locatable object file transferred to a RAM from a host system, based on load address information of each of the symbols, and performs an inter-symbol hook processing based on hook information. Furthermore, by providing the program loader with a creating function of default symbols, fine symbol allocations of all the symbols are performed on a target system, and debugging efficiency are improved.
Japanese Patent Application Publication No. 2004-62552 proposes changing the execution order of an instruction in order to optimize a program and achieve high speed processing. For example, a first instruction having a possibility to raise an order exception is detected; a second instruction which is to be executed prior to the first instruction and assures that no exception of the first instruction occurs is detected; the position of the first instruction in the execution order is changed so that the first instruction can be executed after the second instruction but before a conditional branch instruction for selectively executing the first instruction. This enables a compiler to start in advance the execution of memory access (or the like requiring a substantially long processing time) thereby accelerating the processing of the optimization target program.
The technique described in Pall above has a problem that, when no specialization is made by the HREFK instruction, the access is performed in the conventional slow manner.
In the technique described in Biggar, since the hash value is calculated in advance, the processing speed can be increased, but the technique cannot cope with a case where the symbol is deleted at the time of execution. In addition, since the symbol table is deleted, the technique cannot cope with a case where only part of the program does not need access to the symbol table.
Without using a special system, the system and the method described in JP 2002-132521 are capable of allocating the symbol at an address on a target system specified by the user, and preventing the system from running out of control even if the address resolutions are made for not all the symbols at the time of execution. However, this technique is not intended to increase the speed of accessing the symbol.
Furthermore, the system and the method described in JP 2004-62520 are capable of optimization by controlling the execution order of instructions with use of a dummy instruction, by deleting a redundant instruction, and by moving a memory access instruction for reading an invariable value in a loop to the outside of the loop, as well as changing the execution order of an instruction which may raise an exception. However, this technique is either not intended to increase the symbol access speed.
For these reasons, there is a need of a system and a method capable of increasing the processing speed of a program described in the dynamic language by increasing the access speed of frequent accesses to symbols during the execution of the program.