Firmware of a central processing unit (CPU) is often called upon to copy small amounts of data from one area of memory to another. The mechanisms available to firmware for copying data are suitable for large amounts of data but cause an unacceptable overhead in setting up when only a small amount of data is to be moved.
When firmware carries out a copy operation itself it reads the data from the source address into local memory, and then writes it to the destination address. The firmware reads and writes data in blocks of 4 bytes which is processor intensive. This could be speeded up by using the processor's cache, if available, so the copy could take place in reads and writes which are a cache line long rather than only 4 bytes.
The disadvantage of using the processor's cache is that this can be very slow if a cache miss occurs on the data being copied, which will stall the processor for many cycles.
Using a general purpose CPU to copy around memory, where that CPU has no interest in examining that memory apart from the copy, is wasteful, particularly when the data comes from a bottom tier of memory (for example, L3). Such memory has very high access latency, and the impact to the processor in terms of execution cycles lost is disproportionate compared to the function achieved. Avoiding read accesses to this memory can give a very significant performance boost.
Write operations are not so costly to the processor, since they can be executed from a posted write queue, which releases the processor quickly, and allows the write to take place after the processor has started the next instructions.
Using the processor's cache also has the added disadvantage that it may displace important data in the cache with the copy data which is not actually going to be used by the processor at all. This could have a serious effect on the overall system performance.
Additionally, in many applications firmware is called upon to maintain large lists in memory. These lists may be, for example, lists of addresses used to manage data. An example of this is when lists of addresses point to areas of memory, sometimes called “pages”, which are allocated to an operation. At the end of the operation the areas of memory are freed up to be used in a future operation. This allows firmware to manage memory.
Traditionally, in this use, the firmware maintains two lists of addresses, those which are in use, and those which are free for use. When firmware subsequently allocates one or more of these addresses to an operation it copies the addresses from the list of free addresses to the list of those in use. When the operation completes, firmware then copies the addresses from the list of those in use back to the list of free addresses. Firmware also has to maintain the addresses of these lists. This can be fairly CPU intensive and as system performance is increasingly important anything that can be done to aid the firmware is worthwhile.