The register file and cache memory are part of the microprocessor complex, and are built on the same microprocessor chip. Register file transfer operations are directly controlled by the micro-processor after instructions are decoded. Register file transfer is conducted at microprocessor speed, usually in one clock cycle, and high performance microprocessors therefore require high performance register files.
The advent of superscalar architectures for microprocessors has created the need for register files having many ports. Such multiporting, however, conflicts with the universal goals of high density, high performance and ease of testing. High density multiporting favors single-ended reading and writing. However, single-ended operation makes high-performance more difficult to achieve. Furthermore, simultaneous write operations at different ports pose challenges.
As computer system performance requirements have increased, high performance microprocessor chips have increased their use of dynamic approaches in design. Dynamic logic can be classified into synchronous or asynchronous systems. See J. Hennessy and D. A. Patterson, "Computer Architecture--A quantitative approach", Morgan Kaufmann, 1996; and N. weste and K. Eshraghian, "Principles of CMOS VLSI Design: A Systems Perspective", 2nd edition, Reading, Mass.: Addison-Wesley, 1993.
Systems such as the one depicted in FIG. 1a, which are synchronized to a global clock, are called synchronous. Synchronous systems require synchronization of the setting of latches to the global clock. The information stored in latches and registers can be updated (a new state replacing a present state) in a controlled and predictable manner by triggering from a periodic global clock signal distributed throughout the digital stages. The global clock ensures that all memory elements change state at approximately the same time.
Asynchronous systems, such as the one shown in FIG. 1b, have no global clock distribution, and rely on either handshake or interlock circuit techniques. See, e.g., J. P. Uyemura, "Circuit Design for CMOS VLSI", Boston, Mass., Kluwer Academic Publishers, 1992; and I. Sutherland, "Micropipelines", Communication of the ACM, Vol. 32, pp. 720-738, 1989.
FIG. 1c depicts a self-resetting asynchronous system without feedback between macros. Such systems are designed completely independently of any system clock. In self-resetting techniques, the reset is derived locally either by feedback from downstream evaluation logic, or from a local timing chain triggered by an upstream event. When cycle time is too long, the reset circuits can be broken up into more self-resetting pipe macros, with overlapping reset. Thus the self-resetting macros as shown in FIG. 1c are self-contained.
Each macro in FIG. 1c consists of its own evaluation path and reset chain. Each macro performs locally and asynchronously to the global system clock. Self-resetting circuits are preferable to synchronous domino-logic of higher performance because of the absence of any clocking precharge devices in the logic trees, and the corresponding reduction in loading on clock distribution of the system clock. This alleviates clock skew and power problems. The self-resetting case is difficult to design at the system level because of the lack of the global synchronization provided by a global clock, although there has been substantial work in this area. See e.g., W. Henkels, W. Hwang and T. I. Chappell, "Cells and Read-Circuits for High Performance Register Files" U.S. Pat. No. 5,481,495; and U.S. Pat. No. 5,617,047, W. Henkels, W. Hwang and R. V. Joshi, "Reset-and-Pulse-width-control Circuits for High Performance Multiport Register Files" U.S. Pat. No. 5,617,047.
The present invention allows the realization of high performance multiport register files for the WRITE operation. We propose alternatives for use in advanced register file design. The present invention is highly attractive in many circumstances.