The present invention is related to a cache system, and in particular to a cache system including a branch target address cache (BTAC).
In conventional computer systems, a following instruction is processed after completely processing a previous instruction. Improvements in computing technologies have advanced the architectures of pipelining and superscalar configurations. Such architectures enhance the design complexity, and the price, of a computer processor, but provide enhanced processing efficiency.
A cache architecture enhances the performance of a computer system. The cache stores information such as data and instructions, and provides required information within several clock cycles without the need for accessing main memory. Recent cache systems include a branch target address cache that stores branch target addresses, enhancing the performance of a processor employing a pipeline architecture.
FIG. 1 illustrates a typical computer system employing a branch prediction mechanism. The computer system 100 is comprised of a processor 110, a main memory 150, and input/output unit 160. The processor 110 is implemented on a single chip together with a cache system that is constructed of an instruction fetch unit 120, a cache controller 130, and a cache memory 140. While the instruction fetch unit 120 may be regarded as an independent circuit block embedded in the processor 110, being separated from the cache system, it is included in the cache system. The instruction fetch unit 120 predicts a proper instruction sequence for the processor 110 and thereby fetches an instruction from the cache memory 140 or the main memory 150.
FIG. 2 illustrates a conventional architecture of an instruction cache embedded in the cache memory 140 and a branch target address cache (BTAC) embedded in the instruction fetch unit 120.
The instruction fetch unit 120 includes a BTAC composed of a BTAC tag RAM 121 and a BTAC data RAM 122, a saturation counter 123, and a comparator 125. The instruction cache 140 includes an instruction cache tag RAM 141 and an instruction cache data RAM 142. The BTAC determines whether an instruction stored in a currently accessed cache line of the instruction cache is a branch instruction, and provides a predicted branch address.
FIG. 3 shows an instruction pipeline in a typical computer system. The instruction pipeline is comprised of five sequential states, i.e., an instruction cache fetch state FE, a decoding state DE, an issue state ISS, an execution state EX, and a writing state WB. The execution state EX prosecutes an address generation, an operand fetch, and an instruction execution.
The instruction fetch unit 120 operates as follows. At the fetch state FE, a fetch address PfuAddr:[6:0]=PEFE[6:0] is provided to an index of the BTAC tag RAM 121 when a specific cache line of the instruction cache is accessed. It outputs a tag address BtacTLAddr[31:7] of the BTAC tag RAM 121 accessed with the index and a predicted branch address BtacTLAddr[31:0] of the BTAC data RAM 122. The comparator 125 compares the tag address BtacTLAddr[31:7], which is provided by the BTAC tag RAM 121, with the next fetch address PCFE[31:7] that is a tag address provided by the processor 110. If the two addresses are identical to each other, an address of an instruction to be fetched after a branch instruction is the address BtacTLAddr[31:0] to be read out from the BTAC data RAM 122. If the two addresses are not identical to each other, then no branch prediction is performed.
Based on the result by the comparator 125, if the two addresses BtacTLAddr[31:7] and PCFE[31:7] are not identical to each other and a decoding result of an instruction B1 is determined as a branch instruction, PCWB[31:0] is written into the index PfuAddr:[6:0]=PCWB[6:0] of the BTAC tag RAM 122 after performing an operation under the writing state WB and an address branched after performing the instruction B1 is written into the BTAC data RAM 122. The value PCWB[31:0] represents an address for the instruction B1 stored in the main memory 150.
In the conventional cache system described above, the BTAC needs to be accessed whenever the instruction cache is accessed in order to predict a branch address. FIGS. 4A and 4B provide sequence illustrations of accessing the BTAC when instructions are being propagated through the instruction pipeline shown in FIG. 3. It can be seen in this example, that the BTAC is accessed whenever the instruction cache is accessed, and the tag addresses stored in the cache instruction tag RAM 141 are generated in sequence (FIG. 4A) as well as in non-sequence (FIG. 4B).
Such a BTAC accessing method is capable of enhancing the operating speed of the cache system by minimizing the occurrence of a pipelining stall because it is able to predict the branch address, but it is insufficient in reducing power consumption due to the need of accessing the BTAC at every cycle.