In current computers, when the processor wishes to read from or write to a location in the main memory, the processor first checks whether that memory location is in a cache. This is accomplished by comparing the address of the memory location to all tags in the cache that might contain that address. If the processor finds that the memory location is in the cache, then a “cache hit” has occurred, and the processor immediately reads or writes the data in the cache line. If the memory location is not in the cache, then a “cache miss” has occurred. Cache misses lead to memory latency because the cache misses require the data to be transferred from the main memory. This transfer incurs a delay since the main memory is much slower than the cache memory, and also incurs the overhead for recording the new data in the cache before the new data is delivered to the processor. The proportion of accesses that results in a cache hit is known as the “hit rate”, and is a measure of the effectiveness of the cache. As CPUs become faster, stalls (i.e., memory latency) due to cache misses displace more potential computation.
Linked list have been used in attempts to reduce the memory latencies. Singly-linked lists require one link per data element (i.e., list node), and list nodes may be widely scattered in memory. Also, the cache misses occur in a sequential manner because of the sequential access that is performed on the data elements in the link list. These sequential cache misses contributes to the memory latency problem. Together, these result in poor cache performance.
Arrays also have been used and may provide excellent cache density, and they also permit traversal via simple pointer arithmetic. However, insertion and deletion operations are very expensive for arrays, making them unsuitable for many queue and stack implementations. The problem with arrays is with regard to static allocation. It is extremely difficult and costly to grow or shrink the arrays. This makes them impractical for long-lived unbounded lists.
Another solution is described in, “Virtual Cache Line: A New Technique to Improve Cache Exploitation for Recursive Data Structures”, (IBM Research LAB in Haifa and Israel Institute of Technology, 1999) by Shai Rubin, David Bernstein, and Michael Rodeh. This solution uses a Virtual Cache Line allocation of linked data nodes in clusters to improve cache performance. Although this solution may improve data locality (data proximity in memory), this solution does not reduce the high meta-data overhead within the data nodes. The high memory footprint for data allocated with this VCL technique would result in poorer cache performance because less cache space is available for data storage. Furthermore, this VCL technique requires sophisticated insertion and deletion algorithms to fill the cache lines efficiently.
The C++ standard library uses a chunked-list implementation of double-ended queues, combining the fast traversal advantages of arrays with the dynamic-update advantages of linked data structures. However, the underlying CDT (C/C++ Development Toolkit) only permits insertions and deletions at the ends of the list and does not permit arbitrary updates in the middle of the list. Also, this C++ deque (double-ended queue) does not optimize its chunks for cache performance. A data element and its meta-data may reside in two different cache lines, increasing memory pressure. Finally, this C++ deque does not address a cache prefetching feature.
Therefore, the current technology is limited in its capabilities and suffers from at least the above constraints and deficiencies.