1. Field of the Invention
This invention relates generally to the field of computer processors. More particularly, relates to an apparatus and method for efficient handling of critical chunks.
2. Description of the Related Art
System-on-Chip (SoC) has become the de-facto hardware architecture across a spectrum of computing platforms including handheld mobile devices, personal computing devices, and even micro-servers. SoC architectures comprise several heterogeneous functional units that are communicatively coupled together such as cores, caches, memory controllers, encoders/decoders, cryptographic engines, cameras, display interfaces, etc.
In order to have a single point of coherence which is visible to all such agents, there usually exists a mediator, sometimes referred to as a “system agent,” that caches memory requests internally and once all desired data retirement checks are completed (e.g., coherence is resolved, decryption, etc.), it forwards this data to the requesting agent. For example, if a core makes a request to a 64 Byte request (i.e., two 32 Byte “chunks”), when the data comes back to the system agent from the memory controller, it is stored in the internal cache of the system agent. Once cleared for forwarding (i.e., globally visible in terms of coherence), the system agent prioritizes the forwarding of the critical chunk over the non-critical chunk within a single request.
If there exists multiple blocks cached in the system agent, there is no existing mechanism to prioritize critical chunks across multiple requests (i.e., the system agent might send a non-critical chunk of an old request over the critical chunk of a new request). This, therefore, can lead to suboptimal performance as the critical chunks of some tailing blocks have to wait for non-critical chunks of preceding blocks for non-streaming workloads and those accessing partial blocks only.
Another situation in which this poses a problem is memory scheduling where the system agent schedules requests to a memory controller that may have a width of one chunk (e.g., 32 Bytes). In such a case, traditionally, the system agent would schedule the critical chunk over the non-critical chunk within a single request. However, this could as well lead to suboptimal performance if there are many younger critical chunks awaiting scheduling.