This section introduces aspects that may help facilitate a better understanding of the disclosure. Accordingly, these statements are to be read in this light and are not to be understood as admissions about what is or is not prior art.
Growth in amounts of data processed by computing platforms from mobile devices to data centers, together with the need to bridge the increasing processor-memory gap to feed increasing numbers of cores in a computing system have led to an incessant demand for more quick access memory. Memory is divided into off-chip (i.e., off processors) and onchip. Data that is accessed frequently (often referred to as data with temporal locality) or data with nearby addresses (often referred to as data with spatial locality) are often good candidates for storage in onchip cache memory. Caches are divided into a data array and a tag array.
Cache memory architecture is well established. Direct mapping is one such architecture. In direct mapping, cache memory is divided into a data array (e.g., a data table of n rows and one column, where each cell of the table represents a number of data) and a tag array with a similar disposition. An example will further illustrate this architecture. Suppose, a cache of 128B is used, with each row holding 8 bytes. That means the cache has 16 rows of 8B data. In order to have access to each of the 8B data in each row, the address from the processor is divided into three segments: offset, index bits, and tag bits. The three least significant bits are called offsets (or “b” bits). These bits represent which of the 8 bytes of data in a row of interest is being addressed. In this example, the cache system is byte-addressable, i.e., the smallest accessible chunk of data is a byte (8 bits). Therefore, if there are 16B data, b would be 4. The next four least significant bits represent which of the rows of the cache memory is being addressed (or “c” bits). Since there are 16 rows, 4 bits are needed to differentiate between each row. These bits are the index bits. The remaining bits are the tag bits. Where a main memory of size 2d can be represented by d bits, the number of tag bits equal d minus c bits minus b bits. Since cache rows are constantly rewritten, the tag bits represent whether the correct data is held in the cache. Therefore, if the processor is fetching data associated with a particular address, the tag portion of the address (i.e., the most significant d-c-b bits) are compared with the correct location in the cache (based on the c bits); if the tag portion of the address matches the data in the cache tag array, then that is considered a “hit.” If, however, there is a discrepancy, that is considered a “miss,” in which case the data associated with that tag portion is fetched from the main memory.
Another cache architecture is the set associative architecture. The purpose for this architecture is to avoid collisions of addresses to the same cache location. In this architecture, the data array of the cache memory is divided into multiple columns (n columns), each column is called a “way.” Each block of each row represents a data block associated with a corresponding tag array entry. Suppose the data array is divided into two ways. If for example data associated with two different sets of tag bits are always needed together, these two data sets are placed in the same row, each in a separate block. Consequently, if the index bits described above map to the same cache location (blocks), those two blocks will have identical index bits in the same index location for the two different cache ways. In cases where two data blocks with identical index bits as described above (i.e., map to the same index location in cache) and with different sets of tag bits are always needed together, these two data blocks are placed in the two different cache ways. For a hit/miss detection, both of the tag entries are compared and depending on whether there is a match it will be considered a hit or if not, then it will be considered a miss.
In each of these architectures, there may also be a single bit appended to the cache to indicate whether the data is valid for the combination of c bits and the tag bit.
Regardless of which architecture is used, caches have grown over the years in computing systems, which has seen an increase in energy consumption, particularly due to caches. Complementary metal oxide semiconductor (CMOS) based memories face challenges with technology scaling due to increased leakage and process variations. These challenges, coupled with an increased demand for on-chip memory, have led to an active exploration of alternative on-chip memory technologies.
One such alternative technology is spin transfer torque magnetic random access memory (STT-MRAM) which has gained significant interest in recent years as a potential post-CMOS memory technology. STT-MRAMs offer high density and near-zero leakage, making them promising candidates for on-chip memories. However, their overall energy efficiency is still limited by the energy required for spin transfer torque (STT) switching in writes and reliable single ended sensing during reads.
Several emerging applications that have fueled the demand for larger on-chip memories (including multimedia, recognition, data mining, search, and machine learning, among others) also exhibit intrinsic resilience to errors, i.e., the ability to produce results of acceptable quality even with approximations to their computations or data. Approximate computing exploits this characteristic of applications to derive energy or performance benefits using techniques at the software, architecture, and circuit levels. Most previous work in approximate computing focuses on processing or logic circuits. Previous efforts on approximate storage can be classified based on the level of the memory hierarchy that they target. Some focus on application-specific memory designs. A few efforts explore approximate cache architecture with CMOS memories, using techniques such as skipping cache loads on misses. However, in all these past works a substantial challenge remain based on energy usage of the cache.
Therefore, there is an unmet need for a novel architecture to reduce energy usage in cache memories, particularly in spintronic-based cache memories.