The present invention relates generally to microprocessors, and more particularly, to an instruction fetch unit aligner.
A microprocessor typically includes a cache memory for storing copies of the most recently used memory locations. The cache memory generally is smaller and faster than main memory (e.g., disk). A microprocessor also typically includes an instruction prefetch unit that is responsible for prefetching instructions for a CPU (Central Processing Unit). In particular, an instruction cache unit is typically organized in a way that reduces the amount of time spent transferring instructions having a power of two size into the prefetch unit. For example, a 256-bit bus (256 bits=4xc3x978 bytes=32 bytes) connecting the instruction cache unit and the prefetch unit allows a 32-byte instruction prefetch unit to fetch 32 bytes of instruction data in a single cycle of the microprocessor.
The present invention provides an instruction fetch unit aligner. For example, the present invention provides a cost-effective and high performance apparatus for an instruction fetch unit of a microprocessor that executes instructions having a non-power of two size.
In one embodiment, an apparatus for an instruction fetch unit aligner includes selection logic of an instruction aligner that extracts and aligns a non-power of two size instruction (e.g., 5, 10, 15, or 20 bytes of instruction data) from power of two size instruction data (e.g., 64 bytes of instruction data), and control logic of the instruction aligner for controlling the selection logic. The selection logic is implemented as multiplexer logic for selecting the non-power of two size instruction from the power of two size instruction data. The extraction and alignment of the non-power of two size instruction from the power of two size instruction data is performed within one clock cycle of the microprocessor. For example, four 2:1 multiplexers that each select 8 bytes of the power of two size instruction data can be used to select 32 bytes of instruction data from 64 bytes of instruction data, in which the non-power of two size instruction is within the selected 32 bytes of instruction data, and the multiplexer logic provides 32:1 mux functionality using eight 4:1 multiplexers and four 8:1 multiplexers for every 4 bits of the power of two size instruction data. A reorder channel that appropriately reorders the bits output from the multiplexer logic is also provided.
Other aspects and advantages of the present invention will become apparent from the following detailed description and accompanying drawings.