The present invention relates generally to instruction memory system, and more specifically, it relates to an adaptive instruction prefetching and fetching memory system apparatus and method for microprocessor system to reduce code fraction, to scale code for packing native instructions at software compilation time, and to adaptively, concurrently prefetch PINIs and NIPIs, and fetch purely native instructions at runtime.
The adaptive instruction prefetching and fetching memory system apparatus and method for microprocessor system is designed for reducing instruction cache memory area and operating energy, enhancing access time, resolving or lightening the cost of instruction cache miss penalty, and improving the overall performance of the microprocessor system. The adaptive instruction prefetching and fetching memory system apparatus and method for microprocessor system uses an adaptive instruction prefetching and fetching system integrated with a concurrently accessible hierarchical memory system consisting of a single or plurality of caches and main memories to achieve the same or a similar effect of concurrent instruction prefetching and fetching of a single or plurality of native instructions.
The invented adaptive instruction prefetching and fetching memory system adaptively prefetches and/or fetches a single or plurality of PINIs concurrently while delivering a single or plurality of fetched native instructions to the single or plurality of microprocessors in their programming order. The adaptive instruction prefetching and fetching memory system distinguishes the prefetched and fetched PANIs and non-packed native instructions in PINIs from the single or plurality of main instruction memories via a single or plurality of levels of instruction cache memories before delivering the purely native instructions to the single or plurality of microprocessors.
The invented adaptive instruction prefetching and fetching memory system is capable of prefetching the single or plurality of the NIPIs from the single or plurality of associative locations of the single or plurality of main instruction memories via the single or plurality of levels of instruction cache memories by passing a single or plurality of addresses of the PANIs in the PINIs to a single or plurality of locations in the main and/or cache memories accordingly. The invention prefetches the next prospective single or plurality of PINIs while delivering both the NIPIs and/or non-packed native instructions in the PINIs to the single or plurality of microprocessors.
In order to reduce code fraction, the invention generally packs basic blocks, which are segments of native instructions between two instructions that are branch and branch target instructions and vice versa. The branch instructions—both conditional and unconditional branches—provide their branch target locations if possible. A non-deterministic branch target location that cannot be obtained at the compilation time can be obtained after a single or plurality of program simulations if possible.
In addition, the branch target locations can be obtained from the associated branch target buffers or similar components if necessary. A branch target instruction can be any instruction at a branch target location. The PANI is a native instruction segment packed as a single nonnative instruction. In particular, a PANI contains a single or plurality of native and/or PANIs in a single instruction form as a PINI statically composed at compilation time. A PANI consists of the branch instructions with non-branch instructions, all of the instructions in a loop or a subroutine, or all of the non-branch or branch target instructions. A PANI is comprised of multiple non-packed native and/or PANIs.
Two types of codes are obtained from the software compiled program, such as the assembly program, after another round of compilation. The first type of code contains PINIs. The second type of code contains NIPIs. Alternatively, this static code conversion can be integrated to the software compilation. Instructions of both the first and second types of code are concurrently prefetched and/or fetched through separate paths of the invented memory system if necessary. Therefore, the PANIs are considered as kinds of subroutine callers. The NIPIs are bodies of the subroutines. These bodies of the subroutines are only fetched after prefetching both the subroutine callers and bodies of the subroutines. This results in the better match between the high-level programming style comprising of a main program including a plurality of function callers and the associative functions and the assembly programming style, which causes a plurality of code fractions in prior arts.
A non-packed native instruction is a native instruction of the target microprocessor. A non-packed native instruction is a native instruction that is not packed with any other native and/or PANI. On the other hand, a PANI represents a segment of the native instructions appearing in the program.
The PANIs are composed by converting segments of native instructions in basic blocks including loops and subroutines, and parts of the sequential instructions in any non-fractional parts of the program, and by assigning segments of the native instructions to individually PANIs. A non-fractional part of the program must contain a segment of a single or plurality of instructions including only non-branch and non-branch target instructions. The PANIs contain associative opcodes and/or other information, such as start and/or end locations of the native instruction segments of the PANIs, the number of instructions packed in each PANI, and so on, for the PINI prefetching and fetching memory system to distinguish PANIs from a PINI as well as to identify the native flow control instructions, such as branch instructions.
The adaptive instruction prefetching and fetching memory system apparatus and method for microprocessor system distinguishes PANIs for adaptively and concurrently accessing a single or plurality of native instructions of the single or plurality of PANIs from the dedicated, separate regions of distinct addresses in instruction cache and/or main memories if necessary.
The adaptive instruction prefetching and fetching memory system apparatus and method for microprocessor system permits reducing the number of branch instruction appearances, and provides the same instruction fetching capability of the native instructions in PINIs. In addition, the adaptive instruction prefetching and fetching memory system apparatus and method for microprocessor system allows the adaptive usage of different sizes of instruction cache memory by scaling PANIs to achieve functional compatibility and performance enhancements.
Therefore, the adaptive instruction prefetching and fetching memory system apparatus and method for microprocessor system adaptively handles the instructions converted from the original native instruction segments in the program to the PINIs and NIPIs. Alternatively, the invention directly generates the PINIs and NIPIs from software written in high-level programming language such as C/C++.
The adaptive instruction prefetching and fetching memory system apparatus and method for microprocessor system effectively utilizes available instruction cache memories in terms of the cache memory size, power consumption, and operational speed. In addition, the invention permits considerably conserving instruction cache memory or simplifying the cache organization from the hierarchical instruction memory system. The invention also prefetches the PINIs on the prospective locations in the program flow concurrently for enhancing cache hit rate. Furthermore, the invention prevents the instruction cache memories from wasting energy by accurately prefetching and fetching the instructions that are highly used once they have been accessed and stored in the instruction cache memories. Since more operations, including branches, subroutine callers, and subroutine returns, are reduced and/or packed into PANIs, which are stored in and accessed from small, simple, and low-power cache memories, such as direct mapped cache memories, the invention is useful for low-power and performance-aware microprocessors. Through this invention, developers can: (1) compose their own compatible and ciphered instructions before runtime; and (2) prefetch and fetch purely native instructions concurrently from the single or plurality of main memories via the single or plurality of levels of cache memories.