Since computers and computing systems were first developed, there has always been a demand for increased performance. In order to satisfy this demand, advances in both hardware and software technologies have been required. In the hardware arena, one technique used to increase performance is to provide for greater instruction level parallelism. This means that instruction sequences that were formally performed in serial fashion can now be performed at the same time, i.e. in parallel. In other words, multiple instructions are executed during a functional cycle. One method of increasing parallelism is to provide multiple functional units within a processing system. Typically, these functional units perform tasks such as memory management, integer arithmetic processing, floating point number processing and instruction branch units. Parallel processing attempts to exploit as many of the available functional units as possible during any particular moment in time.
In the software arena, compilers have been developed to take advantage of the opportunities for instruction level parallelism offered by today's hardware architectures. The compilers of previous systems have included two types of schedulers: trace-oriented and region-oriented. Trace-oriented scheduling optimizes the frequently visited path at the expense of non-frequently visited code. Trace-oriented scheduling requires an accurate weighted control flow graph (from profile feedback, static branch prediction, or user keywords) to choose the main traces, i.e. the heavy visited paths. Trace-oriented approaches have at least two drawbacks. First, developers often do not take the time or expense to profile their code. Second, some control flow has no obvious major traces.
Region-oriented scheduling, unlike trace-oriented scheduling, can work well without profile information. However when profile information is available and main traces are clearly detected, a region-oriented scheduling is not aggressive enough to optimize the code.
A further problem is that global schedulers in both trace and region schedulers of previous compilers typically schedule a single instruction at a time. Redundant instructions are typically not detected and removed. Redundant memory loads which are intercepted by ambiguous memory stores usually cannot be removed by traditional compilers due to the unpredictable runtime behavior.
Finally, the global scheduling performed by current compilers can only deal with acyclic regions. If a cycle appears in the region the region cannot be globally scheduled. Cycles can often appear due to looping constructs used in many computer programs. Thus a large subset of code cannot be optimized to achieve instruction level parallelism.
Therefore, there is a need in the art for a system that can perform effective optimization both with and without trace information.