1. Field of the Invention
The invention generally relates to data movement in a computer, and more particularly to a system and method of moving data to and from portions of memory with cacheability being controllable on an individual operational basis.
2. Description of Related Art
Reference is made to FIG. 1 which depicts a typical personnel computer (PC) system with an x86 architecture for displaying graphics. A central processing unit (CPU) 50 having multiple registers (e.g. CS, DS, ES . . . ECX, EDI, ESI) is coupled through a CPU bus 52 to a memory controller 54. The memory controller 54 is coupled to system memory 56, typically DRAM, and to a relatively fast local or xe2x80x9cmezzaninexe2x80x9d bus 58, typically having a protocol in accordance with the Video Electronics Standards Association VL-bus or with the Peripheral Component Interconnect (PCI) bus. The local bus 58 is coupled to a relatively slow Industry Standard Architecture (ISA) bus 60 through a bus converter 62.
The local bus 58 couples a graphics adapter card 64 to the memory controller 54 and to the bus converter 62. The location and color for each pixel displayed on display 66 is stored in a frame buffer memory 68 on the graphics adapter card 64. A RAMDAC 70 on the graphics adapter card 64 converts the data stored in the frame buffer memory 68 to analog signals to drive the display 66 which is typically a cathode ray tube (CRT) or a liquid crystal display (LCD). Each time a change is made in the graphics on display 66, the location and color for each pixel must be recalculated and stored in the frame buffer memory 68.
The CPU 50 typically calculates the location and color definition of each changed pixel and sends the resulting information across the local bus 58 to the frame buffer memory 68 on the graphics adapter card 64. Alternatively, a graphics accelerator 72 reduces the burden from the CPU 50 by receiving certain graphic calls (e.g. fills and line draws) through a graphics driver executed by the CPU 50, to calculate the changes in the pixels and to fill the frame buffer memory 68 with updated graphics data.
The so-called BitBlt graphic call (xe2x80x9cbit blitxe2x80x9d) performs an operation by transferring blocks of graphics data from: system memory 56 to frame buffer memory 68, frame buffer memory 68 to system memory 56, and between different portions within the frame buffer memory 68. The graphics accelerator 72 can effectively handle the BitBlt operation to the extent that data is already stored in the frame buffer memory 68 and the destination is also in the frame buffer memory 68. The CPU 50 however, must still be involved to provide privilege and protection checks if the BitBlt operation requires bitmapped images to be moved from external system memory 56 to the frame buffer memory 68 and from the frame buffer memory 68 to the external system memory 56. The CPU 50 typically handles this through recursive steps, which in x86 architecture parlance, is often a repeat move string instruction of the form:
REP MOVS [ESI (source address), EDI (destination address)] wherein a number of bytes, words, or Dwords of data specified by the ECX register starting at an address pointed to by ESI are moved to a block of memory pointed to by EDI.
The required intervention by the CPU 50 has a large latency associated with it since data must be read from the system memory 56 through the memory controller 54 over the CPU bus 52 into the internal registers of the CPU 50. The CPU 50 must then turnaround and write the data from its registers over the CPU bus 52 through the memory controller 54 onto the local bus 58 to the frame buffer memory 68 on the graphics adapter card 64. Likewise, data must be read from frame buffer memory 68 on the graphics adapter card 64 through the memory controller 54 over the CPU bus 52 into the internal registers of the CPU 50. The CPU 50 must then turnaround and write the data from its registers over the CPU bus 52 through the memory controller 54 to the system memory 56.
The process just described is further complicated by the use of a cache 74. As a way of background, a cache 74, simply put, is a relatively small but fast-access buffer area wherein a copy of previously accessed data, typically spatially or temporally related, is held in hope that subsequent accesses will benefit from the spatial or temporal locality. In other words, the intent of the cache 74 is to reduce the latency associated with data accesses normally made to slow memory by keeping a copy of most recent data readily available. However in the case of reading bitmapped data from system memory 56 to update the display 66, a cache 74 is not significantly advantageous and in fact, can actually hinder performance. To this end, the amount of display information which updates the display is overwhelming compared to the size of the cache 74 and caching the display information has little, if any, impact on performance. More importantly however, by caching the display information, valuable instructions and data are evicted from the cache 74 requiring longer access times to retrieve them from secondary cache or main memory.
As a way of further background, known ways under the x86 architecture to designate data as non-cacheable include non-assertion of the cache enable (KEN# pin) by chipset logic circuitry or by setting a page cache disable (PCD) bit in the directory and page table entries (DTE and PTE). A drawback with using the KEN# pin is that it requires external chipset logic circuitry to determine cacheability. A drawback with using the PCD bit is that the finest gradation of cacheability is made on a page-by-page basis.
In a related, but not entirely relevant technique, direct memory access (DMA) transfers are known which can move the contents of one memory block directly to the contents of another memory block without substantial intervention by the CPU 50. However, these DMA techniques are ineffective, inter alia, for systems having protection or privilege check mechanisms.
Accordingly there is a need for a system and a method of cacheability control on an individual operational basis, for moving data from a first block of memory to a second block of memory, in a system having protection and privilege check mechanisms, without substantial CPU intervention, without long bus turnaround time, and without polluting the cache.
To overcome the limitations of the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method in a processing system having a cache, of transferring blocks of data from a first block of memory to a second block of memory, employing signaling from a CPU core responsive to execution of a predetermined instruction, so that data is transferred directly from the first block of memory to the second block of memory without polluting the cache. The second block of memory is typically scratchpad memory which is preferably, although not exclusively, a partitionable area of the cache. While a destination address is preferably generated from a programmable address register provided as part of control circuitry in the scratchpad memory, it is contemplated that an instruction in accordance with the present invention, could also directly specify a destination address.
A feature of the present invention is transferring data from system memory to scratchpad memory without substantial CPU intervention while maintaining protection and privilege check mechanisms for memory address calculations.
Another feature of the present invention is transferring data from system memory to a scratchpad memory in large blocks to reduce the number of bus turnarounds while maintaining byte granularity addressability.
Another feature of the present invention is transferring data from system memory to scratchpad memory in a system having a cache without polluting the cache.
Another feature of the present invention is effective communication between a CPU core and a graphics pipeline by employing scratchpad memory control circuitry containing data pointers used by both the CPU core and the graphics pipeline to address data in the scratchpad memory.
These and various other objects, features, and advantages of novelty which characterize the invention are pointed out with particularity in the claims annexed hereto and forming a part hereof. However, for a better understanding of the invention, its advantages, and the objects obtained by its use, reference should be made to the drawings which form a further part hereof, and to the accompanying descriptive matter, in which there is illustrated and described a specific example of a system and method of data transfer with cacheability control, practiced in accordance with the present invention.