1. Field of the Invention
The invention relates to the field of code converting. More particularly, the present invention relates to a method for increasing execution rate of iterator processing code in a programming language by converting a code written in one programming language to another programming language.
2. Description of the Related Art
As computer hardware, such as processors and disk devices, become less expensive, a mechanism has been proposed that processes large amounts of data at high speed by storing large amounts of data on a disk, and mutually connecting a plurality of computer system devices. One such mechanism is MapReduce, developed by Google®.
A developer can execute large-scale processing with minimal effort by using MapReduce and relying on MapReduce for the difficult sections with parallel processing.
Later, the Hadoop® project was launched with open source package MapReduce. Jaql, Pig, and Hive and the like were developed as programming languages that are used on Hadoop® MapReduce. These programming languages deal with mechanisms called iterators. When an iterator is used, large amounts of data can be processed without being stored in memory. The data being processed is accessed by the iterator and then converted. The conversion of the data is referred to as iterator conversion. Note, refer to http://code.google.com/p/jaql/wiki/l0; http://pig.apache.org/; and http://hive.apache.org/ for information concerning Jaql, Pig, and Hive respectively.
However, the iterator obtained by this type of conversion can have complicated controls and states, and easily optimizing with an existing compiler, such as a JIT compiler, can be difficult.
In other words, an existing compiler analyzes and optimizes the data flow while determining the transition of the program based on a control flow graph, but transition of the iterator is determined by the internal states of the iterator, so an existing compiler will have difficulty accurately determining the data flow.
For example, the expression iter-> expand e calculates an internal iterator sequence e(x0), e(x1), . . . for each of the elements x0, x1, . . . obtained from the external iterator iter, and returns a result iterator that links the sequences expressed by each. This result iterator keeps track of the states of the external iterator as well as the states as to what is the internal iterator that is currently being viewed.
The condition expression if cond then iter1 else iter2 has the evaluation results of cond as a state.
The iterator generated from the constant size array expression [e0, e1, . . . ] has the index of the internal array as a state. This leads to complex controls and states inside the result iterator.
Specifically, when processing using the Jaql iterator statement:
 $out = $in -> expand [$. a, $. b, $. c, $. d] -> filter ($ > 0),The iterator result $out is expressed by a state transition diagram as shown in FIG. 1. If for example, $out. next ( ) ended the previous time by reading $.b, then the next time reading will begin from $.c. This type of processing is difficult to understand for a conventional compiler because the call site of $out.next ( ) is the location where the control flow joins.
With XML SAX API described in http://docs.oracle.com/javase/1.4.2/docs/api/org/xml/sax/package-summary.html, the input obtained as an event sequence is processed by the event handler. However, this method cannot easily be applied to Jaql.
Furthermore, it is also known that the XQuery compiler Galax at http://galax.sourceforge.net/compiler/ will access data that has not been materialized by the iterator, and thus processing is complex. More specifically, complex iterator processing code is required for accessing the data, and optimizing the compiler can be difficult.