This invention relates to computer systems, and more specifically, to the enhancement of data flow between peripheral devices and the cache of a processor through use of a direct data transfer.
The performance of processor architectures with large memories is generally limited by the slowness of accessing the large memory. To help minimize this limitation, smaller memories know as caches are used. In many common architectures, one cache is dedicated to instructions and another is dedicated to data. As the processor fetches instructions from the larger main memory, for instance, these instructions will be replicated in an instruction cache. In this way, if the processor later uses the same instruction, the processor can fetch this instruction from the cache instead of the main memory, resulting in much faster access times due to the cache""s smaller size. As an instruction or small set of instructions are often repeated many times in close proximity, use of an instruction cache can markedly improve performance. In the same way, data caches can enhance processor performance in situations where data is reused.
In the sort of processor architecture described above, peripheral logic blocks are traditionally placed along side the main memory. The processor may then read one or more words from a given peripheral logic block in order to perform computations upon them. When these same words are required for later computations, the processor stores them in the data cache; or if the processor is engaged in a current task, it may read a word or words from a peripheral block and store them in cache for a later computation. In either of these cases, the processor has to perform a xe2x80x9cread from peripheral and store into cachexe2x80x9d process.
A common technique for increasing the efficiency of data transfer between a peripheral logic block and the main memory is through Direct Memory Access (DMA). In a DMA arrangement, data is transferred directly between the peripheral logic and the main memory under the management of a DMA controller, with the processor removed form the path. In this way, the rate of transfer is no longer limited by the speed of the processor. Additionally, as the processor must no longer directly manage the transfer, supplying only occasional oversight to the DMA controller, it is free to perform other tasks while the transfer takes place.
There are many variations on the DMA technique, but these all supply the data from the peripheral to the main memory. To further move the data to the cache memory requires the processor read the data from the main memory and write it into the cache. For tasks which require a large number of operations upon data coming from a peripheral block, such as streaming audio data for digital signal processing, this required transfer of data from the main memory into cache memory greatly reduces processor efficiency. This problem could overcome if a direct transfer of data could be performed directly between the peripheral logic and the cache, except that such a transfer would undermine the integrity of the cache memory.
The reason for this lies in how the cache is constructed. There are several variations on how to structure a cache, but generically when a processor needs data or a instruction from memory, this information is both copied into the cache and maintained in memory. In this way there is a mapping between each element in the cache memory and an element in the main memory. As the processor operates on an element in, say, the data cache, the result of the operation must also be transmitted back to the corresponding element of the main memory at some time. In order to maintain this correspondence, an entry in the cache will contain not only the actual element, but some way to identify the corresponding address in the main memory and that the cache element is valid. This is true whether discussing an instruction cache, data cache, or unified cache for both instructions and data. Specific examples of how this mapping can be achieved are given below in the discussion of the preferred embodiments. In any case, in order for the processor to function properly, the integrity of the cache must be maintained.
A direct transfer of data between a peripheral logic block and the cache would destroy this integrity as it would break this mapping between the cache element and a memory element. The efficiency of placing the element in the cache would be increased, but as the processor searches for these elements based upon memory address, it could neither properly identify the cache element for which it was searching nor write the element back to a memory location when required.
In this discussion, the term cache is used to mean the standard cache memory of a processor, not a specialized or single purpose structure. In a multilevel cache architecture, this would be the primary or xe2x80x9ctop-levelxe2x80x9d cache. There are instances in the prior art, such as U.S. Pat. Nos. 5,261,072, 5,263,142, or U.S. Pat. No. 5,745,707, which perform a DMA-type of transfer to a structured referred to as a xe2x80x9ccachexe2x80x9d, but these are either specialized structures or temporary buffers which subsequently require the data to be written into main memory before it can be sent to the standard cache memory of the processor. As such, they avoid the complications of a direct data transfer between the peripheral logic and this standard cache, but also lack the benefits such an arrangement could provide.
Therefore, although many processor applications, such as the example of streaming audio data given above or the compression and decompression of data, could benefit greatly from being able to transfer data directly between a peripheral logic device and cache memory, this must be done in a way that preserves the integrity of the cache.
The present invention provides for data flow enhancement in processor architectures having one or more caches by allowing DMA-type transfers to and from these caches. Specific examples allow such direct transfers between a peripheral logic device and the cache memory, or between either the main memory or a special memory and the cache memory. This is done by the processor reserving a portion of cache for the direct transfer, which is then carried out by a DMA-type controller. While this transfer is occurring, the processor is able to carry out other tasks and access the unreserved portion of cache in the normal manner. In the preferred embodiment, the transfer is performed by a cycle stealing technique. Once the transfer is complete, the reserved portion of the cache may be accessed by the processor. The size of the reservable portion may either be fixed or dynamically determined by the operating system based on factors such as task flow management and data transfer rates. In a preferred embodiment, the operating system works in concert with the cache organized into cache lines, assigning each cache line an address tag field, with particular values of the address tag indicating that a given cache line is part of the reserved portion of the cache.
Additional objects, advantages, and features of the present invention will become apparent form the following description of its preferred embodiments, which description should be taken in conjunction with the accompanying drawings.