The present invention relates generally to decompose CLI and CNI from a basic block in the compiled program before look-ahead prefetching, fetching, and branch predicting of the CLIs and CNIs in a sequential and/or parallel manner via a single or plurality of twice fast caches as the microprocessor clock cycle. More specifically, it relates to a compiler-assisted look-ahead instruction-fetch and branch-prediction (CLIB) system apparatus and method to reduce latency of branch prediction, to increase instruction fetch bandwidth, which is a number of instructions fetches per the microprocessor clock cycle. The invention also relates to identify the CLI and CNIs. More specifically, a CLI represents a single or plurality of basic blocks and/or other CLIs and contains information to predicting branch operation, obtaining branch target location of the CLI, and accessing location of the associated CNI.
The CLIB system apparatus and method is designed for enhancing bandwidth of fetching both of the CLIs and CNIs, reducing latencies of instruction cache access, and improving the overall performance of the microprocessors. The invented CLIB system uses a compiler-assisted look-ahead instruction prefetching (CLIP) system and fetching (CLIF) system integrated to a single or plurality of concurrently accessible hierarchical CLIM systems.
The invented CLIP/CLIF system prefetches and/or fetches a single or plurality of CLIs concurrently for branch prediction and/or instruction decode to the microprocessors while delivering a single or plurality of CNIs in their compatible fetching order for instruction decode to the microprocessors. The CLIP/CLIF system prefetches and fetches CLIs and the associated CNIs from the single or plurality of concurrently accessible main CLI and CNI memories via a single or plurality of levels of concurrently accessible CLI and CNI caches and delivering the CLI and CNIs to the microprocessors.
The invented CLIP/CLIF system is capable of look-ahead prefetching the single or plurality of CNIs from the locations of the main CNI memories via the single or plurality of levels of CNI caches by obtaining a single or plurality of addresses from the CLIs to a single or plurality of locations in the main CLI memories and/or CLI caches. The CLIP/CLIF system prefetches the next prospective CLIs and CNIs from both of taken- and not-taken branch paths and continuously prefetch CLIs/CNIs from a single or plurality of next paths while fetching both of the CLIs and CNIs to the microprocessors.
The invented CLIB system includes a compiler-assisted look-ahead compiler (CLC) to decompose basic blocks, which generally contain pairs of branches and a branch target instructions and vice versa. In addition, the CLC creates a CLI to represent a basic block as a single instruction if necessary. In particular, the CLI contains a single or plurality of branch instructions and/or other CLIs. The CLI comprises the branch instructions with non-branch instructions in a loop or a subroutine, or all of the non-branch or branch target instructions. A CLI also comprises of a single or plurality of CNIs and/or other CLIs.
The invented CLC generates CLIs and associated CNIs from the compiled program, such as the assembly program. The CLIs and CNIs are sequentially and/or concurrently prefetched and/or fetched through separate paths of the CLIM systems if necessary. A CLI accesses a single or plurality of CNIs if the CNIs are bodies of the basic blocks in general. Thus, the CNIs are only fetched after prefetching both the CLIs as the basic block callers and the CNI and/or other CLIs as bodies of the basic blocks. This results in the look-ahead CLI prefetching and fetching for look-ahead branch prediction and the sequential and/or concurrent CNI prefetching and fetching.
The CLC composes a CLI comprising an associated opcode to identify it as a branch and/or a type of branch, such as conditional or unconditional branch, and other information including the first and/or the last associated CNIs, a number of native instructions included in the CLI, the information of the branch target location, and so on, for prefetching and fetching the next CLIs after decoding the CLI.
The CLIB system apparatus and method for microprocessors permits reducing a number of branch instructions while providing the compatible native instruction prefetching and fetching. In addition, the CLIB system apparatus and method for microprocessors allows scaling a basic block to a single or plurality of CLIs to prefetch and fetch native instructions in the same basic block in parallel and quickly while continuously providing the code compatibility. Alternatively, the CLC directly produces the CLIs and CNIs from high-level language programming.
The CLIB system apparatus and method for microprocessors effectively utilizes available instruction caches in terms of the cache size, power consumption, and operational speed by employing a simple, small, and twice fast access speed of the microprocessors' clock speed. The invention also prefetches in a look-ahead manner the CLIs and CNIs on both of the prospective paths in the program flow concurrently or sequentially before fetching and predicting branch in a look-ahead manner the CLIs and fetching CNIs concurrently or sequentially. Furthermore, the invention fetches CLIs and CNIs in an accurate manner by only fetching CLIs and/or CNIs from the CLI and/or CNI caches. More specifically, a number of native instructions encapsulated in CNIs are reduced by decomposing any flow control native instructions, including conditional and unconditional branches, subroutine callers, and subroutine returners, as CLIs and the other native instructions as CNIs. Since the flow control instructions do not change any operation results, CNIs provide compatibility if the CNIs are fetched and executed in right order. Therefore, CLIs contain important information regarding the order of the CNIs and are fetched to a branch predictor for predicting to fetch next CLI.
Through this invention, one can decompose their own compatible and ciphered instructions as CLIs and CNIs and prefetch and fetch CLIs/CNIs sequentially and/or concurrently from the main CLI/CNI memories via the levels of the CLI/CNI caches. More specifically, a single or plurality of branch prediction results is obtained by look-ahead prefetching and/or fetching of next CLIs and the associated CNIs to the microprocessors.