The present invention generally relates to computer systems. More particularly, the present invention relates to a method and apparatus of improving performance in computer systems by arranging cache modules in several interconnected operational modes.
A cache or cache module as used interchangeably throughout this specification, is intended to enhance the speed at which information and data are retrieved. A main memory typically stores a large amount of data which is time consuming to retrieve. The cache module contains a copy of portions of the main memory. When a processor attempts to read a word of memory, a check is made to determine if the word is in the cache module. If so, the word is delivered to the processor. If not, a block of main memory, consisting of some fixed number of words, is read into the cache module and then the word is delivered to the processor.
The main memory consists of up to 2n addressable words, with each word having a unique n-bit address. For mapping purposes, this memory is considered to consist of a number of fixed-length blocks of K words each. That is, there are M=2n/K blocks. The cache module consists of C lines of K words each, and the number of lines is considerably less than the number of main memory blocks.
FIG. 1 is a block diagram illustrating a simplified picture of a network involving a processor 12 with cache module 40 connected via address, control and data lines 43, 44, and 45, respectively. Address and data lines 43 and 45 also attached to address and data buffers 41 and 42, respectively which attached to system bus 20 from which main memory (not shown) is reached.
Typically, processor 12 generates an address of a word to be read. If a xe2x80x9chitxe2x80x9d occurs, (the word is contained in cache module 40), the word is delivered to processor 12. When this cache hit occurs, the data and address buffers 42 and 41, respectively, are disabled and communication is only between the processor 12 and the cache module 40, with no system bus traffic. When a cache xe2x80x9cmissxe2x80x9d occurs, (the word is not contained in cache module 40), the desired address is loaded from main memory (not shown) onto system bus 20 and the data is returned through data buffer 42 to both the cache module 40 and the main memory. With a cache miss, a line in the cache may be overwritten or copied out of cache module 40 when new data is stored in the cache module. This overwritten line is referred to as a xe2x80x9cvictim blockxe2x80x9d or a xe2x80x9cvictim line.xe2x80x9d
The basic structure of a conventional multi-processor computer system 10 employing several cache modules is shown in FIG. 2. Computer system 10 includes processors 12, 120 and 220 as shown which are connected to various peripheral devices including input/output (I/O) devices 14 (such as a display monitor, keyboard, graphical pointer (mouse) and a permanent storage device (hard disk)), memory 16 (such as random access memory or RAM) that is used by processors 12, 120 and 220 to carry out program instructions, and firmware 18 whose primary purpose is to seek out and load an operating system from one of the peripherals (usually the permanent memory device) whenever computer system 10 is first turned on. Processors 12, 120 and 220 communicate with the peripheral devices by various means, including a generalized interconnect or system bus 20, or direct-memory-access channels (not shown).
Processor 12, as well as each of the other processors 120 and 220, includes a processor core 22 having a plurality of registers and execution units, which carry out program instructions 13 in order to operate the computer system 10. As shown, processor 12 further includes one or more cache modules, such as an instruction cache 24 and a data cache 26, which are implemented using high-speed memory devices. As described above, cache modules are commonly used to temporarily store values that might be repeatedly accessed by the processor, in order to speed up processing by avoiding the longer step of loading the values from memory 16. These cache modules are referred to as xe2x80x9con-boardxe2x80x9d when they are integrally packaged with the processor core on a single integrated chip 28. Each cache module is associated with a cache controller (not shown) that manages the transfer of data and instructions between the processor core 22 and the cache.
Processor 12 can include additional cache modules, such as cache module 30, which is referred to as a level 2 (L2) cache since it supports the on-board (level 1) caches 24 and 26. In other words, cache module 30 acts as an intermediary between memory 16 and the on-board caches, and can store a much larger amount of information (instructions and data) than the on-board caches can, but at a longer access penalty. Cache module 30 is connected to system bus 20, and all loading of information from memory 16 into processor core 22 comes through cache module 30.
One drawback to the conventional cache module arrangement as shown is that the cache modules do not benefit from being interconnected. Without the cache modules being interconnected, it is inefficient to retrieve data since each cache must be searched individually if data is not found in the first cache that is searched.
Accordingly, what is needed is an effective and efficient method for directly connecting cache modules for retrieval of information.
In accordance with an embodiment of the present invention, a computer system having cache modules interconnected in series includes a first and a second cache module directly coupled to an address generating line for parallel lookup of data and data conversion logic coupled between the first cache module and the second cache module.