1. Field of the Invention
The present invention generally relates to cache memory used in computer systems, and more particularly to a pipeline cache for loading instructions and data in a computer system that is processor cycle time independent.
2. Description of Related Art
The basic structure of a conventional computer system includes one or more processing units connected to various input/output devices for the user interface (such as a display monitor, keyboard and graphical pointing device), a permanent memory device (such as a hard disk, or a floppy diskette) for storing the computer""s operating system and user programs, and a temporary memory device (such as random access memory or RAM) that is used by the processor(s) in carrying out program instructions. The evolution of computer processor architectures has transitioned from the now widely-accepted reduced instruction set computing (RISC) configurations, to so-called superscalar computer architectures, wherein multiple and concurrently operable execution units within the processor are integrated through a plurality of registers and control mechanisms.
The objective of superscalar architecture is to employ parallelism to maximize or substantially increase the number of program instructions (or xe2x80x9cmicro-operationsxe2x80x9d) simultaneously processed by the multiple execution units during each interval of time (processor cycle), while ensuring that the order of instruction execution as defined by the programmer is reflected in the output. For example, the control mechanism must manage dependencies among the data being concurrently processed by the multiple execution units, and the control mechanism must ensure the integrity of data that may be operated on by multiple processes on multiple processors and potentially contained in multiple cache units. It is desirable to satisfy these objectives consistent with the further commercial objectives of increasing processing throughput, minimizing electronic device area and reducing complexity.
Both multiprocessor and uniprocessor systems usually use multi-level cache memories where typically each higher level is smaller and has a shorter access time. The cache accessed by the processor, and typically contained within the processor component of present systems, is typically the smallest cache. As such, the cache entries available at the highest level cache are often being reallocated. This is due to requests for new data and a need for space to store that data within the higher levels of cache memory. As new reads are performed up the levels of cache, the data being brought up from a lower level is typically written to the cache through a secondary port connected to the bus to which the lower level of the memory hierarchy is coupled. The data from this port can be made available to the higher level port (to which the next higher level of the memory hierarchy or in the case of the highest level of the memory hierarchy, the processor is coupled) in a xe2x80x9cbypassxe2x80x9d or xe2x80x9clookasidexe2x80x9d mode, so that the data being written to the particular level is available without an additional read cycle.
Pipeline instruction and operand data caching systems have been used to improve the performance of data processing systems for many years. The object of a pipeline system is to perform loading of instructions or data, core execution, and other functions performed by a core simultaneously, rather than having load operations delay the operation of the core. Also, multiple instruction or data loads can be queued by load store units or instruction sequencers. Therefore, it is desirable to reduce any delays associated with retrieving these values from cache or other memory.
When reading data from a cache level, the latency (which is the time required for the data to be available from the read operation) is not constant. Processor frequency and type of operation may affect when the data is available. For example, when a relatively slow processor clock frequency is used, data may be available from a cache in one instruction cycle, whereas at a higher processor clock frequency data from a cache may not be available for two processor cycles. In addition, when a particular value is not present in the cache, a cache xe2x80x9cmissxe2x80x9d occurs and the value must be read from a lower level in the memory hierarchy into the higher level. By taking advantage of the fact that data is present while being written into the higher level of the memory hierarchy, the bypass mode can be implemented such that the data is read immediately at the end of the fetch from lower level cache. Present systems accomplish this by multiplexing data from the lower level port and the storage array in the cache. These multiplexers require circuit area and the signal lines to the multiplexers complicate circuit interconnect layers and occupy interconnect space.
Present pipeline systems synchronize the instruction and data pipeline, (which are operated synchronously with the processor) with the cache by latching values to keep the cache read latency synchronized with the pipeline, no matter what the processor clock frequency is. In addition, systems that read data from the cache or select data in bypass mode that is concurrently being written to the cache use a multiplexing scheme that selects either the write input to the cache (bypass) or the latched read data from the cache. These multiplexers add delay to the overall circuit path, and in addition, many present systems perform a xe2x80x9cread after writexe2x80x9d operation where the data is written to the cache from the lower level before the processor is allowed to read the cache entry, so that any advantage presented by the bypass mode data being available early is lost due to the need to synchronize that data with the pipeline.
Therefore, it would be desirable to implement a pipeline cache such that data does not have to be latched as cycle time is lengthened, for example when processor clock frequency is lowered. In addition, it would be desirable to implement a pipeline cache wherein the early availability of bypass mode data can be used advantageously. Further, it would be desirable to eliminate multiplexer delays when implementing a system that can select between the bypass port and the cache storage array.
It is therefore one object of the present invention to provide an improved pipeline cache wherein data does not have to be latched when pipeline clock frequency is decreased.
It is a further object of the invention to provide an improved pipeline cache wherein data in bypass mode and data provided from the storage array are not selected via a multiplexer from independent buses.
It is still another object of the invention to provide an improved pipeline cache wherein bypass mode data can be used as soon as it is available.
The foregoing objects are achieved in a dynamic data pipeline having a bus input for providing a data value, clock means for indicating when the data value is valid, preset means for presetting a summing node when the clock means indicates that the data is invalid and a data input means coupled to the bus input for setting the summing node to a value in conformance with the data value. The data input means may further couple a plurality of data inputs by a plurality of devices connected to the summing node, and at least one of the devices may be coupled to a reload port. The preset means may be provided by a PMOS transistor with channel connections coupled to a supply rail, and the summing node and a gate connection coupled to an output of the clock means. The data input means may be provided by an NMOS transistor having a gate coupled to the bus input and a channel connection coupled to the summing node. The above objects are also accomplished in a method for pipelining data and a cache memory incorporating the dynamic data pipeline described above.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.