This disclosure relates generally to the field of cache coherency, and more particularly to monitoring locations accessed by a co-processor during transactional execution of a processor that are not in the read-set or write-set of the transaction.
The number of central processing unit (CPU) cores on a chip and the number of CPU cores connected to a shared memory continues to grow significantly to support growing workload capacity demand. The increasing number of CPUs cooperating to process the same workloads puts a significant burden on software scalability; for example, shared queues or data-structures protected by traditional semaphores become hot spots and lead to sub-linear n-way scaling curves. Traditionally this has been countered by implementing finer-grained locking in software, and with lower latency/higher bandwidth interconnects in hardware. Implementing fine-grained locking to improve software scalability can be very complicated and error-prone, and at today's CPU frequencies, the latencies of hardware interconnects are limited by the physical dimension of the chips and systems, and by the speed of light.
Implementations of hardware Transactional Memory (HTM, or in this discussion, simply TM) have been introduced, wherein a group of instructions—called a transaction—operate in an atomic manner on a data structure in memory, as viewed by other central processing units (CPUs) and the I/O subsystem (atomic operation is also known as “block concurrent” or “serialized” in other literature). The transaction executes optimistically without obtaining a lock, but may need to abort and retry the transaction execution if an operation, of the executing transaction, on a memory location conflicts with another operation on the same memory location. Previously, software transactional memory implementations have been proposed to support software Transactional Memory (TM). However, hardware TM can provide improved performance aspects and ease of use over software TM.
A co-processor is a computer processor used to supplement the functions of the primary processor, the CPU. A co-processor offloads specialized processing operations, thereby reducing the burden on the basic CPU circuitry and allowing it to work at optimum speed. By offloading processor-intensive tasks from the main processor, co-processors can accelerate system performance. Operations performed by a co-processor may include floating point arithmetic, graphics, signal processing, string processing, encryption, compression, or I/O interfacing with peripheral devices.
An encryption co-processor may be a dedicated computer on a chip or microprocessor for carrying out cryptographic operations, embedded in a packaging with multiple physical security measures, which give it a degree of tamper resistance. Unlike encryption co-processors that output decrypted data onto a bus in a secure environment, a secure encryption co-processor does not output decrypted data or decrypted program instructions in an environment where security cannot always be maintained. Secure encryption co-processors may output decrypted data or decrypted program instructions into memory or cache locations within the secure encryption co-processor, which may be fetched by the CPU via direct memory accesses.