A common goal in the design of computer microarchitecture is to increase the speed of execution of a given set of instructions.
To improve processor performance, it is common to interleave a cache by the basic operand blocks (e.g. 64 bits or 1 double word in the L1 cache) and provide fetch or store accesses without incurring the area penalty of a cache array supporting both read and write operations at the same time.
It is also common to have parallel processing pipelines within a cache to support multiple fetch accesses, either to allow superscalar bandwidth in the processor as in the L1 cache, or to process multiple L1 cache miss requests as in the L2 cache. For example, to provide 2 pieces of operand data at each processor cycle, the L1 D-cache structure will generally consists of 2 processing pipelines to handle 2 fetches concurrently. Each processing pipeline will require a lookup into the directory to check the ownership status of the data to be fetched, as required in a multiprocessor system to maintain memory coherency.
A typical L1 D-cache implementation will have either two 1-read-port physical directory arrays (one for each processing pipe), or have one 2-read-ports physical directory array (providing simultaneous access from both pipes). The L1 D-cache array will usually have 2-read-ports to provide the data access bandwidth required. A typical L2 implementation will have directories and caches equally split (usually by line) among the 2 processing pipes, such that each pipe can only access half of any existing L2 cache data.