The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent the work is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
In the description that follows, “byte” refers to an octet, that is, to 8 binary digits (bits). The term “kilobyte” and the abbreviation “KB” both refer to 1024 bytes.
In a computer processor, a cache memory circuit (hereinafter, a cache) may be used to store data corresponding to a location in a main memory. The cache is typically smaller, has lower latency, and has higher bandwidth than the main memory.
The cache may include a plurality of tags respectively associated with a plurality of memory elements that store data (hereinafter, cache lines). The cache compares an address of a memory operation with an address stored in a tag to determine whether the cache line associated with the tag corresponds to the location indicated by the address of the memory operation, that is, whether the location indicated by the address of the memory operation is cached.
A set-associative cache may only check one of a plurality of sets of the cache when determining whether a location is cached. The set-associative cache determines which set to check using bits of the address of the memory operation. For example, in a set-associative cache having 256-byte cache lines and 256 sets, the set to check may be determined using bits 15 through 8 of the address of the memory location (with bits 7 through 0 indicating particular bytes within the 256-byte cache lines). In this example, bits 15 to 8 correspond to a set address, referred to herein as an index.
A set-associate cache may have a plurality of ways. A cache having a plurality of ways may be referred to as a multi-way cache. The number of ways indicates the number of distinct locations in the cache that may correspond to any one memory location.
Caches may be used in processors having virtual memory architectures. In a virtual memory architecture, virtual addresses are generated by the processor and are then translated by a Memory Management Unit (MMU) into physical addresses. In a typical MMU, memory addresses are translated in pages. For example, in an MMU using 4 KB pages, each 4 KB page in the virtual memory space may be mapped to a 4 KB page in the physical address space. A location at an offset within the virtual memory page will be located at the same offset within the corresponding physical address page. The MMU may include a Translation Look-aside Buffer (TLB) to decrease the amount of time required to perform the address translation.
A Physically Indexed and Physically Tagged (PIPT) cache may determine the set address using a physical address and compare the physical address to the one or more tags in the set indicated by the set address. As a result, in a PIPT cache the physical address is determined before the data cache can be accessed.
To reduce a latency of load operations, the cache may begin a process of retrieving data before the physical address of the data is fully known. In particular, a virtually indexed, physically tagged (VIPT) cache may begin the process of retrieving data before the MMU completes an address translation between a virtual address and a physical address.
The VIPT cache is a set-associative cache that uses a plurality of bits of the virtual address as the set address, that is, the VIPT cache uses a virtual set address (VSA) as an index of the cache. Once the index has been determined, the VIPT cache compares a plurality of bits of the physical address against the tags in the set corresponding to the index to determine whether the VIPT cache includes a cache line corresponding to the memory location specified by the physical address. Because virtual indexing is limited by the minimum translation page size supported by the instruction set architecture, the VIPT cache may require more associativity than a PIPT cache.
A VIPT cache may perform serial access of its tag and data SRAMs. In such a serial-access VIPT cache, a Virtual Address is used to generate the index used to access both data and tag SRAM banks, with the data SRAM banks being accessed after the tag SRAM banks.
Another VIPT cache may perform parallel access of its tag and data SRAMs. Such a parallel-access VIPT cache is similar to the serial-access VIPT cache, except that the data and tag SRAMs are accessed in parallel, which can result in lower data latency than the serial-access VIPT cache.
The use of a parallel-access VIPT cache may reduce data access latency but may also increase power dissipation. The power dissipation may increase because the parallel-access VIPT cache reads data from all the ways of the data SRAM, as a result of not knowing which way includes the requested data at the time the data SRAM is read.