1. Technical Field
The present disclosure is directed to systems and methods establishing/utilizing an error-tolerant multithreaded register file that employs dynamic multithreading redundancy (DMR) for error correction. More particularly, non-overlapped register access patterns associated with the disclosed systems and methods create hardware redundancy dynamically that is exploited for error control. Thus, the present disclosure relates generally to memory structures in computers and, more particularly, to error control related to memory components and functions in multithreaded computing systems.
2. Background Art
Research in computer architecture has focused on exploiting higher levels of parallelism in instruction processing. Chip multithreaded computing, such as simultaneous multithreading (SMT) and chip multiprocessors (CMP), generally allows effective resource utilization and thus has potential to achieve higher levels of instruction throughput. Indeed, SMT and/or CMP based architectures have the potential of long-term scalability.
In multithreaded memory systems, each physical address is mapped to multiple sets of adjacent memory cells for access by concurrent threads. FIG. 1 illustrates a conventional bitcell in a dual-threaded register. To support dual-threaded execution, each register bitcell integrates two identical memory cells. Four transmission gates determine the thread selection, where each memory cell is exclusively accessible by one thread. A thread switch is performed by flipping the control signal thread.
The dual-threaded register files essentially doubles the number of memory cells and thus the physical size of the data array. However, chip multithreaded computing raises a set of new challenges. For example, new memory components, such as multithreaded register files, become critical to computing effectiveness and reliability. As the size of register files continues to increase to facilitate multithreaded computing, register files become vulnerable to transient (soft) errors caused by particle strikes. Soft errors occur randomly and cause no permanent damage and, hence, are tough to detect, track and/or control. These unpredictable errors raise a serious concern with respect to register file reliability. Register files are performance-critical and directly impact the integrity and efficiency of instruction execution.
A known technique aimed at addressing the potential for transient (soft) errors involves embedding a low-complexity parity checking logic into each register entry to provide simple but effective error detection. [See E. S. Fetzer, L. Wang and J. Jones, “The multi-threaded parity protected 128 word register files on a dual-core Itanium Architecture Processor,” Proc. International Solid-State Circuits Conference (ISSCC), pp. 382-383, February 2005.] The parity checking logic computes the parity bit of each register entry. A parity upset caused by soft errors is detected and reported with a read operation. The parity checking logic remains active and computes the parity bit continually. A content change due to write operation or thread switch triggers parity checking logic to recompute the parity bit. Due to the requirement of single cycle latency, run-time error correction is hard to achieve. If a parity error is reported, the pipeline is flushed and a cache access may be initiated to retrieve the correct data, thereby negatively impacting system performance.
More particularly and with reference to FIG. 2, a high speed XOR tree performs parity computation on the stored register data. The final parity and parity valid bits are delivered to latches for parity comparison. Parity computation takes a few clock cycles to complete (four for floating point register file; three for integer register file). A write operation or a thread switch event triggers control logic to clear the ParityComp signal and starts the parity computation. The StoredParity signal is updated after four (or three) clock cycles when the new parity becomes valid. Thus, frequently written registers receive less parity protection due to the latency in computing parity. However, these registers get updated frequently and thus are less susceptible to soft errors. The XOR tree remains active and computes the parity signal on a constant basis. A parity upset caused by soft errors will set the ParityError bit, which is available along with the register data by read operations. If a parity error is reported, the pipeline is flushed and a cache access may be initiated to retrieve correct data.
Existing error-control solutions also include radiation-hardened memory structures, double or triple memory redundancy, and code checking logic. These solutions have shown to be effective for caches and DRAM chips, where design overheads are manageable because of the long access latency and out-of-path error correction. However, integrating existing error-control techniques into register files presents a significant challenge due to the severe constraints on area and timing margins. Building and integrating full-fledged error-control schemes in register files is practically impossible because, inter alia, such schemes will hurt performance.
Thus, despite efforts to date, a need remains for systems/methods that provide efficient error-control to ensure robust operation for multithreaded register files. In addition, a need remains for systems/methods that reduce the potential for error accumulation effects in multithread memory systems. These and other needs are satisfied and/or addressed by the systems and methods disclosed herein.