A Level-1 (L1) cache of a microprocessor is commonly implemented as a tightly coupled unit of tag and data arrays. Whether the L1 cache is implemented as a single-port cache array or a dual-port cache array, processes of fetching address and data are tightly coupled. If there is an address or data conflict related to at least two consecutive memory access instructions, then one of the memory access instructions needs to stall in a pipeline causing stalling of one or more following instructions that leads to performance degradation of the L1 cache as well as to performance degradation of a microprocessor where the L1 cache is incorporated. Stalling of instructions due to address and/or data conflict is further undesirable since the effective clock frequency of operating the L1 cache is substantially reduced. In addition, there is a negative performance impact if the tag array is not accessed as early as possible.
A data array of the L1 cache is typically much larger than a tag array. The data array may be, for example, a large compiler memory that typically requires multiple clock cycles (e.g., two clock cycles) for memory access, such as data load/store. Because of the multi-cycle memory access, the data array is typically implemented by using multiple memory banks that can be simultaneously accessed. If, for example, two consecutive memory access instructions request access to two different memory banks, no bank conflict exists and addresses in separate memory banks can be simultaneously accessed without any conflict. On the other hand, if two consecutive memory access instructions request access to the same memory bank (either the same or different data), then a bank conflict exists (e.g., data or address conflict), and one of the memory access instructions (e.g., a later instruction of the two consecutive instructions) needs to be stalled. Since instructions are executed in order, one or more instructions following the stalled instruction can be also stalled, which is undesirable since it negatively affects performance of an L1 cache as well as performance of a microprocessor incorporating the L1 cache. Generally, the multi-cycle access of a data array can cause more bank conflicts than accessing of a smaller tag array.