In general, the role of data cache is to copy a portion of data contents from a lower level memory into itself, to enable these contents to be accessed faster by a higher level memory or processor core, to ensure the continuous operation of the pipeline. The current high performance computer/microprocessor, etc. uses independent data cache to avoid the Von Neumann bottleneck: instruction and data using the same channel. With the extensive application of multi-instruction issue, singular data cache has often become the bottleneck to restrict processor function improvement.
In the existing technology, data cache system is comprised of data memory and tag memory. Data memory and tag memory correspond row-to-row. Data memory stores data; tag memory stores tags of data addresses, that is, the higher bits of data addresses. The addressing of the current data cache is based on the following: use the index in the address to find the tag stored in that index from the tag memory and match it with the tag portion in the address; use the index in the address and the shift amount in the block to find and read out the content in the cache. In the existing technology, the adjacent memory rows have the contiguous index addresses, but the tags stored in the adjacent rows in the tag memory can be non-contiguous. From the compiler's point of view, the target program is running in its own logic address space, every program value has its own address in this address space. A target program in the logical address space of the run-time image contains the data region and code region. Data region usually comprises of static region, heap region, unused memory, and stack region. Stack region stores the data structure for the record of activity, which are usually generated during the function calls. Usually the static region and the heap region grow from the bottom to top in the memory space, the stack grows from the top to bottom in the memory space. The two do not overlap, separated by unused memory region.
Usually the register resources within the processor are limited, during a procedure call it is often necessary to move the contents of the registers to memory. At the end of procedure call, any register used by the caller must be restored to hold the values it held before the procedure. The contents that were swapped out to memory are written back into register.
In the existing technology, the ideal data structure for swap out/write back to register is stack. Stack is a type of last in first out queue. Stack needs a pointer to point to the latest assigned address, indicating the memory location to store the register content which will be swapped out by the next procedure call, or the memory location of the register's old values. When data is stored into stack it is called stack push, when data is moved out of stack it is called stack pop. In accordance with the common practice, stack “grow” from high address to low address. This indicates that when stack push, stack pointer value decreases; and when stack pop, stack length decreases, stack pointer increases. In other words the addresses in stack are contiguous.
Because the addresses in stack are contiguous, and the tags stored in tag memory are not contiguous, therefore the cache goes through a lot of trouble when searching and replacing, and has become the most serious bottleneck that restricts modern processors.
The method and system apparatus disclosed in this invention can solve one or more problems above.