A data processing apparatus is commonly provided with a data cache in which copies of a subset of data items present in an external memory may be stored, such that these copies may be accessed by the processor unit faster than if the original data items were accessed in the external memory. For data items which are needed frequently by the processor, the provision of this data cache can result in significant time saving for the data processing apparatus.
A known data processing apparatus is illustrated in FIG. 1 and the steps of the operation of its cache are shown in FIG. 2. FIG. 1 schematically illustrates a data processing apparatus 10 including a processor 20, a cache 30, a bus 40, an external memory device 50 and a peripheral device 60. The processor 20 performs calculations and manipulations on data items stored in the memory 50, however rather than reading a data item from the memory 50 and re-writing it there every time a calculation is performed on that particular data item, a cache 30 is provided to store temporary copies of data items that the processor needs. Being located in close proximity to the processor 20, access times to the cache 30 are significantly shorter than via the bus 40 to the external memory 50. When the processor 20 issues a memory access request it is first checked whether a copy of the required data item is currently stored in the cache 30. If such a copy exists in the cache 30, this is known as a cache “hit” and the copy of the data item may be read or altered by the processor memory access request much faster than if that data item were stored in the external memory 50. However, the access time advantages gained by the provision of a cache 30 must be balanced against other considerations, in particular the size of the cache 30.
Being physically located close to the processor 20, indeed they are commonly both accommodated on the same chip, the space available for the cache 30 is rather limited. Furthermore there is a known trade-off between cache size and cache speed—smaller caches tend to operate faster. Consequently the cache 30 will typically only store a small subset of the data items stored in the external memory 50. When a copy of a new data item is stored in the cache 30 therefore, a copy of a data item already stored in the cache 30 must be selected to be overwritten. Several algorithms for selecting this “victim” are known in the art and will not be discussed further here. An entry in the cache 30 which is selected to be overwritten or “victimised” must be examined before being overwritten to establish whether the data item has been altered since being copied from the external memory 50. This is typically kept track of by a means of a status bit associated with the entry in the cache called a “dirty” bit, “dirty” indicating that the data item has changed since being copied. If this is the case then the victim must be “evicted”, that is, passed to the external memory 50 to update the original data item therein.
The steps involved in the functionality of the cache 30 described above are illustrated in FIG. 2. It should be noted that in a typical data processor not all memory mappings are cacheable, e.g. some peripheral addresses are not, however for the remainder of this description it is assumed that the exemplary memory addresses are indeed cacheable. Firstly at step 100 the processor issues a memory access request. It is then checked, at step 110, whether a copy of the requested data item is stored in the cache, i.e. whether or not there is a “cache hit”. This is typically performed by comparing the address of the memory access request with the data addresses (or portions thereof called TAGs) associated with the entries in the cache. If there is a cache hit then the memory access request may complete in the cache (step 120). If the memory access request is a write operation then the cache line containing this data item is marked as “dirty”, since the data item has now been altered since being copied from the external memory 50. If the memory access request misses in the cache then the flow proceeds to step 130, where a cache line is selected to be victimized, i.e. overwritten. The selected line is then examined at step 140 to see if it is dirty. If it is then the copy of this data item must be returned to the external memory 50 to update the original therein and at step 150 this victim is evicted. Thereafter the flow proceeds to step 160 (or directly from step 140 if the victim line was not dirty) and a bus access request is issued to the external memory 50 to retrieve the data item requested by the memory access request from the processor at step 100.
In addition there arise many instances where the data items used by at least one of the applications running on the processor are sensitive data items that should not be accessible by other applications that can be run on the processor. An example would be where the data processing apparatus is a smart card, and one of the applications is a security application which uses sensitive data, such as for example secure keys, to perform validation, authentication, decryption and the like. It is clearly important in such situations to ensure that such sensitive data are kept secure so that they cannot be accessed by other applications that may be loaded onto the data processing apparatus, for example hacking applications that have been loaded onto the data processing apparatus with the purpose of seeking to access those secure data.
In known systems, it has typically been the job of the operating system developer to ensure that the operating system provides sufficient security to ensure that the secure data of one application cannot be accessed by other applications running under the control of the operating system. However, as systems become more complex, the general trend is for operating systems to become larger and more complex, and in such situations it becomes increasingly difficult to ensure sufficient security within the operating system itself
Examples of systems seeking to provide secure storage of sensitive data and to provide protection against malicious program code are those described in U. S. patent application Ser. No. 2002/0007456 A1 and U.S. Pat. No. 6,282,657 B and U.S. Pat. No. 6,292,874 B. In such systems the data processing apparatus is provided with separate domains, these domains providing a mechanism for handling security at the hardware level, by having a secure domain and a non-secure domain. The non-secure and secure domains in effect establish separate worlds, with the secure domain providing a trusted execution space separated by hardware enforced boundaries from other execution spaces, and likewise the non-secure domain providing a non-trusted execution space. A program executing in a specified non-secure domain does not have access to data identified as secure. Each access request then has a domain security signal associated therewith identifying whether the access is a secure access or a non-secure access.
However whilst some data must be kept secure as described above, there may also be other data items which may need to be accessed by both secure and non-secure applications. One example of this would be when a secure and a non-secure application need to exchange information. Accordingly, it will be desirable to provide an improved technique, which enables a data cache to manage the correct security level of, and access to, the copies of data items it stores.
In systems having multiple bus masters additional problems of coherency arise. All bus masters in a multi-master system, particularly CPUs in a multi-processing system, need a consistent, up to-date view of all memory resources they share. Hence it follows those systems in which one or more caches are capable of making local copies of shared memory resources must provide at least one mechanism to ensure a consistent, up to date view of shared memory resources is still provided to all the bus masters sharing those resources. The property of maintaining consistency between one or more cached copies of shared memory resources between multiple bus masters is referred to as cache coherency.
Software components executing in secure and non-secure domains are often required to communicate via a shared memory mechanism. The maximum security classification which can be afforded to such shared memory, which can be manipulated by non-secure bus masters, is non-secure. As such, data shared between secure and non-secure domains is usually established in the memory map of the non-secure domain, and so may always be accessed by a non-secure bus master.
In known systems, if such shared data is cached, and is required to be accessed by at least one bus master which is only able to issue secure access requests, cache coherency must be managed by software. The cache coherency software requires shared data is cleaned (by issuing cache maintenance operations) from the cache each time that data is to be accessed by a bus master in the alternate security domain from that which last had control of it, and hence had the opportunity to establish the shared data in the cache. For performance and power reasons, it would be advantageous to implement a hardware mechanism to maintain cache coherency of data which needs to be shared between secure and non-secure domains, where such systems contain at least one secure bus master with which data is to be shared, only being able to issue secure access requests.