A computing system generally contains a processor, memory, and a data bus. The processor performs the operations of the computer, executing code, and moving data from and to memory locations. The data bus sends or retrieves data to or from main memory. A conventional computer additionally contains input and output ports and interfaces to provide data input or output from the processor to external devices (keyboard or display).
Cache memory is a copy of the main memory that is repeatedly used to reduce processing time. Generally, whenever memory is accessed, to retrieve information or to store information, the processor must traverse the data bus, request the data, and wait for a response. This takes a lot of processing time. To reduce the time for processing and manipulating memory over the data bus, cache memory is used. The processor uses cache memory to avoid traversing a bus, and waiting for a response. Since the cache is inside the processor, the cache memory reduces bus calls. The cache memory can be accessed much quicker than continually reading and writing to main memory. The processor then manipulates the cache memory directly. When the processor is done with the memory, or when additional cache memory space is needed for another application, the cache memory is written back to the main memory, and data from a different memory location is written into the cache memory.
Generally, cache controllers operate in a manner that is invisible to the software being executed. However, especially in embedded systems, there are reasons to provide capabilities to the software to manipulate cache entries either by direct access to cache entries or indirect access via commands that reference desired memory addresses. These commands often include the ability to invalidate a cache entry and/or clean an entry. The data in an entry is written out when that data has changed since the main memory has been written. Often times, these instructions operate on the real memory address as opposed to the cache entry. That is, the specified address is looked up in the cache and if present, the entry where the block is found is cleaned and/or invalidated.
Memory blocks are often automatically inserted into a cache entry upon a read access to the memory. That is, the cache controller will find an empty cache entry, or free an existing one by evicting an entry in the index mapped by the specified address. As the data is read from main memory, a whole block is read and copied into the empty cache entry. There are times, however, when the software is referencing memory in which the existing data is going to be ignored. That is, the software desires an area of memory in cache, but is going to immediately rewrite the data, not using the existing information in that memory location. Writing the data before it is in cache can be expensive from a performance standpoint as can waiting for data to be read from memory that is going to be ignored anyway.
FIG. 1 illustrates an exemplary conventional computing system 100 with processor 102, cache memory 104, data bus 106, and input/output interfacing ports. The processor 102 controls the data bus 106 to read from or write to main memory 114. The main memory 114 may additionally be accessed over a server 112, as part of a storage area network. The input/output interface 108 permits a user to interact with the computing system 100 through input and output devices 110 such as a keyboard or display. The processor 102 will request data from the main memory 114 through the data bus 106. The data in the main memory location requested 120 is then retrieved by the data bus 106 and copied into the cache memory location 122. The processor may then manipulate the data directly. Once the memory space is needed by another main memory allocation, or the processor is finished manipulating the data, the data bus will write the cache memory data back to the main memory location. The cache memory location 122 links to the main memory location 120 by using part of the available memory as an index to the main memory location.
Though the cache memory is an attempt to reduce the processing time in manipulating memory by moving a part of that memory into the processor, there are still delays in retrieving the data from memory. Before the cache memory is used, the processor still traverses the data bus, requests the data, and waits for a response. The main memory is copied into the cache memory before it is manipulated. Once in cache, the processor can manipulate the memory relatively fast. However, waiting for the processor to retrieve the memory and store it in cache is still an expensive operation, in terms of processing time.
One solution to the delay time in accessing memory is a preload instruction. The preload instruction is called by a user when a memory location is accessed in advance of when it is needed by the processor. The preload instruction permits a user to load memory into cache before it is actually needed. Therefore, the system hides the copy time if the request is sent with sufficient time prior to its use. However, the preload instruction takes up cache memory before it is needed by the processor and also occupies the data bus which may slow down other operations unnecessarily. A cache memory block is written to main memory and the cache memory allocated to the new memory. This allocation is potentially before the processor is finished with a previous memory manipulation. If the function is called too early, the memory fetched may have to be written back to memory and never accessed by the processor. The cache may evict the preloaded memory location to manipulate other memory needed by the processor before the desired memory manipulation is performed. In addition, a programmer must foresee the need for the memory space before it is actually required by the processor. In contrast, if the function is called too late, then the processor may still stall if sufficient time was not given to traverse the data bus for the required memory allocation.
An example of a conventional processor, including the input/output interface, data cache, and data bus is produced by ARM. Specifically, the ARM 1156T2 provides the preload instruction to retrieve main memory before it is required by the processor. Information on the ARM processors is generally available at http://infocenter.arm.com/help/index.jsp, the contents of which are incorporated by reference herein. Specifically, the documentation for the ARM 1156 CPU is available at http://infocenter.arm.com/help/topic/com.arm.doc.ddi0338g/DDI0338G_arm1156t2s_r0p 4_trm.pdf, the contents of which are incorporated by reference herein.
An example of when the preload instruction may be used is when an area of memory is needed to control an input/output operation. A block of cache memory is allocated to control everything the processor needs for the operation. The data is copied into cache with sufficient time that the processor has the memory available when the processor is ready to manipulate the data; therefore, the processor does not have to wait for the memory to be read. However, if the pre-load is used too often, then the bus is slowed down. Generally, the cache and the bus can handle only a limited number of calls before it is no longer efficient. Also, if another application needs to retrieve information, it may be slowed down by the pre-load retrieve, therefore interfering with another application or user before the retrieved information is needed by the preload user.
Generally, the cache is a working copy of main memory, which is more expensive to access. When the memory is about to be rewritten, reading it into cache is a wasteful operation on the memory bus. It not only can result in processing delays while waiting for the memory read to complete, but it can add to congestion in the memory interface.