The present invention relates generally to communication networks, and, more particularly, to a system for pre-fetching data frames using hints from a work queue scheduler.
A communication network typically includes multiple digital systems including gateways, switches, access points and base stations. These digital systems exchange data packets (also referred to as data frames) and manage data transmissions across multiple digital systems. A digital system includes at least one processor or hardware accelerator that performs logical and mathematical operations on the data frames.
The hardware accelerator or the processor performs the logical and mathematical operations by executing multiple instructions that are stored in multiple memories of the digital system. The memories are also used to store the data frames. The time required by the processor (or the hardware accelerator) to access the data frames stored in the memories is referred to as access time, which is proportional to the proximity of the memories to the processor. The access time also depends on the memory type. The access time is measured as a count of machine cycles required by the processor to access the data frames from the memories. The lower the access time, the faster the processor can process the data frames, thereby resulting in increased throughput and improved system performance. However, memories with low access time are more expensive than memories with high access time.
The digital system includes two types of memories, cache memory and system memory. The cache is located closer to the processor (or the hardware accelerator) than the system memory, and typically is smaller and has a faster read time, hence, a low access time. However, since cache memory is smaller and more expensive than system memory, the cache typically is not big enough to store all the data frames received from other digital systems in the network. For example, the cache memory may comprise one of a static random-access memory (SRAM) and a flash memory, while the system memory comprises dynamic random-access memory (DRAM). Thus, the system memory is used for storing the data frames received from the communication network. However, fetching the data frames from the system memory leads to processing delays. To improve performance, conventional digital systems pre-fetch some data frames from the system memory and store them in the cache.
When the processor requires a data frame, the processor checks for availability of the required data frame in the cache. If the required data frame is available in the cache, it is referred to as a cache hit. For cache hits, the processor reads the required data frame from the cache. However, if the required data frame is not available in the cache, it is referred to as a cache miss. For a cache miss, the processor reads the required data frame from the system memory, thereby resulting in a delay. Thus, it is necessary to identify the data frames required for subsequent processing and pre-fetch the identified data frames from the system memory to the cache to reduce the number of cache misses.
One pre-fetching technique includes stashing of a subset of the data frames in the cache. The subset of the data frames is concurrently stored in the system memory. When the processor requests access to a first data frame of the subset, the first data frame is available in the cache. However, if the time from when the subset of the data frames is stashed to when the processor requests the first data frame exceeds a cache-flush time period, the cache will evict the subset of the data frames. The subset of the data frames stashed in the cache memory also may be overwritten. Therefore, the processor will not find the first data frame in the cache and must retrieve it from the system memory.
Another pre-fetching technique includes predicting a subset of data frames that will be required by the processor and pre-fetching this subset of the data frames from the system memory to the cache. A prediction algorithm is used to monitor cache misses that occur due to the absence of requested data frames in the cache memory. The prediction algorithm identifies addresses associated with the requested data frames that resulted in the cache misses, and then identifies a predicted set of addresses based on the addresses associated with the requested data frames that resulted in the cache misses. The subset of data frames associated with the predicted set of addresses is then pre-fetched and stored in the cache. However, the processor may require only a few predicted addresses of the predicted set of addresses for processing. The remaining predicted addresses of the predicted set of addresses are not used. The machine cycles required to pre-fetch these unused data frames are wasted. Further, since the prediction algorithm requires initial cache misses, this technique results in delay in the processing of the data frames and results in performance degradation of the digital system.
Therefore it would be advantageous to have a system and method for pre-fetching of data frames to the cache that reduces the number or occurrence of cache misses.