1. Technical Field
The present invention generally relates to processors and in particular to a technique for enhancing operations within a processor.
2. Description of the Related Art
A processor is a digital device that executes instructions specified by a computer program. A typical computer system includes a processor coupled to a system memory that stores program instructions and data to be processed by the program instructions. High level processor instruction execution may be broken down into three main tasks: (1) loading data into the upper level cache from memory or an input/output (I/O) device; (2) performing arithmetic operations on the data loaded from memory; and (3) storing the results out to memory or to an I/O device.
Of the three main tasks for processor instruction execution, storing, or writing the data to the memory (or I/O device) is the most flexible in regards to the latency of completing the task. Therefore, when there is a simultaneous request to access the upper level cache for loading and a request to access the upper level cache for storing, the loading operation is typically chosen to proceed prior to the storing operation. If multiple requests are made to load data, a request to store data to the cache may occur on consecutive processor execution cycles without success. The most common method of handling the occurrence of waiting to store data to the cache is to utilize a store queue (STQ). A STQ holds the data to be stored while waiting to access the cache.
Some STQs allow more recently processed data to write (or store) to the cache before data that has been waiting longer has been written to the cache. The process of younger data retiring (i.e. writing data into the cache) before older data retiring is known as out-of-order (OoO) operations. OoO STQs may introduce data integrity problems also known as store ordering hazards. For example, in a store ordering hazard, a younger data store to a given address may be retired prior to an older store to the same address. The data integrity problems resulting from the OoO STQ may result in a violation of the sequential execution model that is standard in processor architecture.
There are current methods of processing data stores to address the problems of OoO STQ, such as operations utilizing dependency vectors or synchronization identification (SID). Although dependency vectors are able to fully handle multiple synchronizing operations within an OoO STQ concurrently, use of these vectors does not scale well to larger (e.g. greater than sixteen entry) STQs. Although SID operations address the problem of processing synchronized entries from a particular thread, SID operations do not permit multiple all-thread synchronization operations to coexist simultaneously within a STQ. Dependency vectors and SIDs are effective in some aspects of STQ operations. However, the restrictions of SIDs decrease the efficiency of the processor instruction execution; thereby, decreasing the efficiency of the processor, and the lack of scalability when using dependency vectors in large STQs raises the area and power costs of the processor more than is desired.