In order to improve the performance of computers having a single central processing unit, computer designers have developed computers which have many central processing units. Often, the central processing units in such multiprocessing computers are connected to each other and to the system""s main memory over a single common bus. Recently, however, central processing unit performance is improving at a faster rate than bus performance technology. Faster internal central processor performance results in the need for more external bandwidth. That is, the amount of data transmitted on a common bus must increase to support increased central processing performance. Consequently, the number of central processors which can be connected to a common bus is limited by the bandwidth needed to support the central processors and the total bandwidth of the common bus.
One approach for reducing the bus bandwidth required by each processor in a multiprocessing system has been to place a cache unit between each processor and the common bus. Once data is loaded into a processor""s associated cache unit, the processor can access the data in the cache unit without using the common bus. Typically, when a processor obtains data from its cache unit, less data is transmitted over the limited bandwidth of the common bus.
In many cases, a processor will modify a particular data value many times which, in turn, necessitates rewriting the data value back to main memory each time the data value is modified. Rewriting modified data values back to main memory, however, increases the amount of bus bandwidth needed to support a processor. Therefore, if the number of write operations can be reduced, the bus bandwidth required to support a processor can be reduced.
One type of cache unit which reduces the number of write operations is called a xe2x80x9cwrite-backxe2x80x9d cache. A write-back cache temporarily stores the modified data values and thus reduces the number of bus transactions needed to write the data values back to main memory. For example, a processor may modify a data value many times in the write-back cache without writing the data back to main memory. The write-back cache ensures that the modified data is eventually written back to main memory.
While write-back caches can be very efficient at reducing the total bus bandwidth required by a multiprocessing system, write-back caches unfortunately create memory coherency problems. For example, each write-back cache contains its own copy of a data value. In such situations, if more than one processor can independently modify a data value, then different versions of the same data value could exist in more than one write-back cache. This would result in erroneous operations, consequently, some mechanism must ensure that all the processors have a consistent view of all data values at all times.
For example, when a processor modifies a data value, the modified data value exists in the write-back cache before it is written back to main memory. In this example, until the write-back cache writes the modified data value back to main memory, the main memory and the other cache units will contain a stale copy of the data value. In order to maintain data integrity, however, the other processors which request the data value must obtain the up-to-date version of the data value, not the stale data value.
The process of ensuring that all the processors have a consistent view of all data values is called cache coherency. One popular and successful set of methods for achieving cache coherency relies on what are called xe2x80x9csnooping operations.xe2x80x9d While a wide variety of snooping operations exist, basically, the snooping operations in a cache unit monitor the bus transactions on the common bus. The snooping operations identify which transactions affect the contents of a cache unit or which transactions relate to modified data existing in a cache unit. Snooping operations typically require that all the processors and their associated cache units share a common bus. Sharing a common bus allows the cache units to monitor the bus transactions and potentially interfere with a bus transaction when a particular cache unit contains a modified data value.
Cache coherency methods also typically utilize coherency status information which indicates whether a particular data value in a cache unit is invalid, modified, shared, exclusively owned, etc. While many cache coherency methods exist, two popular versions include the MESI cache coherency protocol and the MOESI cache coherency protocol. The MESI acronym stands for the Modified, Exclusive, Shared and Invalid states while the MOESI acronym stands for the Modified, Owned, Exclusive, Shared and Invalid states.
The meanings of the states vary from one implementation to another. Broadly speaking, the modified state usually means that a particular cache unit has modified a particular data value. The exclusive state and owned state usually means that a particular cache unit may modify a copy of the data value. The shared state usually means that copies of a data value may exist in different cache units, while the invalid state means that the data value in a cache unit is invalid.
In operation, the cache units snoop the bus operations and use the coherency status information to ensure cache coherency. For example, assume that a first processor having a first cache unit desires to obtain a particular data value. Furthermore, assume that a second processor having a second cache unit contains a modified version of the data value (the coherency status information indicates that the data value in the second cache unit is in the modified state).
In this example, the first processor initiates a read bus request to obtain the data value. The second cache unit snoops the read bus request and determines that it contains the modified version of the data value. The second cache unit then intervenes and delivers the modified data value to the first processor via the common bus. Depending on the system, the modified data value may or may not be simultaneously written to the main memory.
In another example, assume that the first processor desires to exclusively own a particular data value. Furthermore, assume that a second cache unit contains an unmodified, shared copy of the data value (the coherency status information indicates that the data value in the second cache unit is in the shared state). In this example, the first processor initiates a read bus request which requests data for exclusive use.
The second cache unit snoops the read bus request and determines that it contains a shared copy of the data value. The second cache unit then invalidates its shared data value by changing the data value""s coherency status information to the invalid state. Changing the data value""s coherency status to the invalid state invalidates the data value within the second cache unit. The first processor then completes the read bus request and obtains a copy of the data value from main memory for exclusive use.
While snooping operations maintain cache coherency on multiprocessing systems with a single common bus, more powerful computers contain more than one bus such that each bus interconnects main memory with multiple processors; however, because a common bus has a growing limitation in the number of processors it can support, a multiple-bus system might be necessary to achieve a desired level of performance. A problem associated with multiple buses is that the processors on one bus cannot monitor the transactions initiated by the processors on the other buses. Consequently, the snooping operations cannot maintain memory coherency in multiple-bus computers.
One way to maintain cache coherency in multiple-bus systems is to broadcast the bus transactions initiated on each bus to all the other buses. Unfortunately, this approach results in having the combined bus bandwidth load of all buses transmitted to each bus. As can be expected, this can significantly reduce system performance and obviate the benefit of multiple buses.
A second approach is based on what are called directory-based cache coherency methods. The IEEE Scaleable Coherent Interconnect is an example of a multiple-bus, directory-based cache coherency system. In directory schemes, the processors do not snoop the bus transactions. Rather, the main memory subsystem maintains memory coherency by storing extra information with the actual data.
The extra information in the main memory subsystem typically indicates 1) which processor or processors have obtained a copy of a data value and 2) the coherency status of the data values. For example, the extra information may indicate that more than one processor shares the same data value. In yet another example, the extra information may indicate that only a single processor has the right to modify a particular data value.
When a processor requests a data value, the main memory subsystem determines whether it has an up-to-date version of the data value. If not, the main memory subsystem transfers the up-to-date data value from the processor with the up-to-date data value to the requesting processor. Alternatively, the main memory can indicate to the requesting processor which other processor has the up-to-date data value.
Because the information regarding the location of the up-to-date version of each data value is kept by the main memory subsystem, the processors do not need to xe2x80x9csnoopxe2x80x9d the bus transactions. Keeping such a directory, however, can add significant cost to a system due to the additional information that must be held for each data value in main memory. In addition, maintaining a directory for each data value in main memory can also degrade system performance due to the time needed to locate and transfer the required data to a requesting processor.
An alternative to directory-based systems would be a bus interconnect which stores the coherency status information associated with the data values which are actually stored in the cache units. Thus, rather than storage which increases proportionally as the main memory increases (as in directory-based schemes), the amount of storage is only related to the much smaller size of the combined cache units. This approach, however, requires the multiple-bus system to store a duplicate copy of the coherency status information associated with all the data values in each of the cache units.
For example, Sun Microsystem""s UltraSparc system uses a bus switch to interconnect multiple buses wherein each bus is in communication with processors having internal cache units. The bus switch maintains a duplicate copy of the coherence status information associated with all the data values in the cache units. In the UltraSparc system, the bus switch is capable of maintaining a duplicate copy of the coherency status information because the processors in the UltraSparc system are configured to provide accurate information as to which data value is being replaced allowing an external cache tag can be maintained.
Such a bus switch, however, is not feasible with many off-the-shelf processors because they do not output accurate cache data replacement information. For example, many conventional processors keep accurate coherency status information only within their internal cache units. Thus, other devices cannot determine when a data value is removed from an internal cache unit. Without accurate information about the coherency status information in the internal cache units, a bus switch cannot maintain a duplicate copy of the coherency status information.
The present invention provides a cache-coherent, multiple-bus system which effectively increases the total processor performance limitations of single-bus systems. The present invention recognizes that multiple-bus, multiprocessing systems need a low latency, high-bandwidth system which 1) interconnects multiple system buses and multiple I/O devices to a shared main memory and 2) efficiently maintains cache coherency while minimizing the impact to latency and total bandwidth within the system. The subject invention addresses these problems with xe2x80x9ccoherency filtersxe2x80x9d which allow the coordination of bus-to-bus communications in such a way as to maintain cache memory coherency while reducing the overhead in cross-bus traffic.
In a preferred embodiment of the present invention, the system buses, I/O buses and memory units are coupled via a multiported bus switch. This bus switch not only connects any system bus or I/O bus to any memory unit, but also handles cross-bus traffic. In addition, the preferred bus switch contains bus interface logic which determines the operation or operations needed to respond to bus transactions. The present invention, however, is not limited to such a multiported bus switch and can be utilized in a wide variety of other bus interconnects, such as when separate bus bridges exist for different data paths.
To ensure cache coherency in a multiple-bus, multiprocessing system, each bus which supports caches has an assigned coherency filter. Each coherency filter contains a tag controller, a cycle encoder and a rules table. In addition, each coherency filter is coupled to a tag memory. Generally speaking, each tag controller interfaces with all of the tag memories. Each cycle encoder determines what kind of bus transaction is occurring on the cycle encoder""s assigned bus and each rules table determines what bus transaction or transactions are needed to maintain cache coherency.
Focusing now on the tag memories, each tag memory maintains a record of 1) the addresses of the data values which are located in the cache units connected to the tag memory""s assigned bus, and 2) the cache coherency status associated with the data values. As is well known, each data value in main memory is identified with a corresponding memory address. In the preferred embodiment, the tag memories store the data value addresses which identify data values, not the actual data values. In addition to storing the data value addresses, the preferred tag memories also store the coherency status information associated with the data value addresses.
For instance, assume that a first coherency filter and a first tag memory is assigned to a first bus. Further assume that a first processor on the first bus requests a data value from the main memory. The first coherency filter maintains a record of the memory address in the first tag memory. In addition, the first coherency filter also stores the coherency status information associated with the memory address in the first tag memory.
The amount of data accessed in a memory transaction varies from system to system. In most conventional systems, when a processor performs a memory read transaction, the processor accesses enough memory to fill a portion of the processor""s internal cache memory. Typically, an internal cache memory stores multiple data values in what is called a cache line.
As is well known, memory in a conventional computer processing system is divided into 8-bit quantities (bytes), 16-bit quantities (words) and 32-bit quantities (double words). In many current 32-bit processors, main memory is organized into double word (32-bit) boundaries. In most 32-bit processors each cache line can hold multiple double words.
In general, when a processor requests a data value, the processor obtains enough data to fill an entire cache line. For example, in the Pentium(copyright) Pro processor available from Intel Corporation, each internal data value varies in size, but is no larger than 64 bits. The Pentium(copyright) Pro""s cache line, however, holds 32 bytes of data (256 bits). When a Pentium(copyright) Pro processor desires to obtain a data value from main memory, it typically obtains eight data values (256 bits) needed to fill one of its cache lines.
In conventional systems, each cache line is identified by a cache line address. For example, in a Pentium(copyright) Pro system, a cache line will have the same cache line address as the memory address of the lowest-order data value in the cache line. However, because each cache line contains 32 bytes of data, the cache line address of each cache line is shorter and does not include the five lowest-order address bits. In the preferred embodiment, each tag memory assigned to a particular bus stores the cache line addresses.
In addition to storing the cache line addresses, each tag memory also stores the coherency status associated with the cache line addresses. The coherency status relates to the status of the cache line in the cache units. In the preferred embodiment, the coherency status contains three different coherency statesxe2x80x94an invalid state, a shared state or an owned state.
The invalid state means that the cache line is invalid and that the cache entry which stores the cache line is empty and can store a new cache line. The shared state means that a processor has a copy of the cache line but does not have modification rights. Shared cache lines, for example, are often program instructions which are not modified, or read-mostly data items. The owned state means that the cache line may be modified by a processor which has obtained the cache line.
A person of ordinary skill in the art, however, will appreciate that the coherency status of a cache line is not limited to the invalid, shared and owned protocol. Indeed, a person of skill in the art will recognize that the coherency status could be implemented with a wide range of coherency protocols such as the Modified, Exclusive, Shared and Invalid (MESI) protocol, the Modified, Owned, Exclusive, Shared and Invalid (MOESI) protocol, the Modified, Shared, Invalid (MSI) protocol, a two state Invalid and Owned protocol, the Berkeley protocol, the University of Illinois coherency protocol, Digital Equipment""s Firefly protocol, the Xerox Dragon protocol and the like. The preferred embodiment utilizes the Invalid, Shared and Owned because of its ability to efficiently interface with Pentium Pro processors which utilize the MESI protocol.
In many conventional processors, the processors have internal cache units which do not output accurate coherency status information about the cache lines stored within the internal cache units. For example, an internal cache unit may discard an unmodified cache line without signaling that the cache line has been discarded. In another example, an internal cache unit may obtain a cache line with modification privileges which the cache unit does not modify. In this example, the cache unit may discard the cache line without signaling that the cache line has been discarded. Consequently, devices which monitor the cache unit may believe that the cache unit has a modified copy of the cache line when the cache line has in fact discarded the cache line. In the preferred embodiment of the present invention, however, each tag memory is uniquely adapted to ensure cache coherency for internal cache units which do not output current coherency status information.
An important aspect of the present invention is that each tag memory ensures cache coherency by maintaining a superset of the cache line addresses which might possibly be currently held in the internal cache units. Thus, the superset of cache line addresses in a tag memory may indicate that a particular cache line in a cache unit is in the shared state when the cache unit has, in fact, discarded the cache line. In other cases, the superset of cache line addresses in a tag memory may indicate that a particular cache line in a cache unit is in the modified state, when the cache unit has, in fact, written the cache line back to main memory.
In order to maintain a superset of the cache line addresses, the preferred coherency filters use what is called the inclusion rule. The inclusion rule ensures that the cache line addresses stored in the cache units connected to a particular bus are always a subset of the cache line addresses in the tag memory assigned to that bus. Because each coherency filter monitors all the cache lines accessed by its associated bus, the address associated with each accessed cache line is maintained in the tag memory assigned to the bus. When a cache line address must be deleted from one of the tag memories, the inclusion rule directs the associated cache units to delete the cache line from their cache memories.
For example, when a tag memory does not have the memory capacity to hold a new cache line address, room must be made in the tag memory for the new cache line address by expelling one of the existing cache line addresses (the old cache line address) from the tag memory. If the old cache line address is in the invalid state (the cache units connected to the bus no longer are using the cache line associated with the old cache address), the coherency filter assigned to the tag memory simply replaces the old cache line address with the new cache line address.
However, when the old cache line address is in the shared or owned state, the coherency filter cannot expel the old cache line address from the tag memory until the cache units invalidate the old cache line address. As explained above, the preferred tag memories maintain a superset of the cache line addresses, thus the old cache line address must first be invalidated in the cache units before the old cache line address can be replaced with the new cache line address.
The coherency filters invalidate the old cache line address in the cache units by performing an invalidation bus transaction. The invalidation bus transaction directs the cache units connected to the bus to internally invalidate the old cache line address and its associated cache line.
For instance, assume that a first processor with a first cache unit and a second processor with a second cache unit is connected to a first bus which has an assigned coherency filter and a tag memory. Furthermore, assume that the first cache unit contains a first cache line in the shared state. In this example, the tag memory contains the first cache line address and the shared status information. In addition, assume that a second processor initiates a read bus transaction which requests a second cache line. Finally, assume that the tag memory does""t have the memory capacity for the second cache line address.
In this example, the coherency filter needs to expel the first cache line address to make room for the second cache line address. However, before the coherency filter can expel the first cache line address, the coherency filter must perform a bus transaction which invalidates the first cache line address in the first cache unit. To invalidate the first cache address line, the coherency filter performs an invalidation bus transaction which directs the first cache unit to invalidate the cache line associated with the first cache line address.
While performing the invalidation bus transaction, the coherency filter suspends the read bus transaction for the second cache line address. Because the first cache line address is in the shared state (the first cache line has not been modified) the first cache unit responds to the invalidation bus transaction and invalidates the first cache line. After completion of the invalidation bus transaction, the coherency filter replaces the first cache line address in the tag memory with the second cache line address.
In some cases, however, the first cache unit may have modified the first cache line (i.e., the first cache line is in the owned state). If the first cache line is in the owned state, the first coherency filter again performs the invalidation bus transaction which invalidates the first cache line. However, if the first cache unit has modified the first cache line, the first cache unit responds to the invalidation bus transaction by performing a write bus transaction which writes the modified first cache line back to main memory.
After writing the modified first cache line back to main memory, the first cache unit invalidates the first cache line. The coherency filter then replaces the first cache line address in the tag memory with the second cache line address. Thus, in some cases, maintaining a superset of the cache line addresses in the tag memory requires the cache units to write modified data back to the main memory before invalidating a cache line in the tag memory.
In the preferred embodiment, each coherency filter stores the cache line addresses in the tag memories using direct mapping techniques. Direct mapping techniques specify that each cache line address is mapped to a specific tag entry in a tag memory. While the preferred embodiment uses direct mapping techniques, one of ordinary skill in the art will recognize that a number of different techniques can be used to organize the cache line addresses within the tag memories. For instance, instead of direct mapping techniques, the tag memories may use fully associative mapping techniques. In a fully associative system, any cache line address can exist in any tag entry. In other embodiments, each cache line address can be stored in only one of two different tag entries (two-way set associative), or one of four different tag entries (four-way set associative), etc.
Focusing now on the direct mapping techniques of the preferred embodiment, each cache line address is used as an index to identify a particular tag entry. In the preferred embodiment, the number of entries in a tag memory defines the size of what is called a tag page. Preferably, the tag memories coupled to each system bus have the same tag page size. The tag page size is related to the amount of total cache memory in the caches of the processors. Furthermore, the tag memory coupled to the I/O bus is smaller in size because of the small cache units which are typically coupled to the I/O bridges.
A tag page should not be confused with a page of main memory. As is well known in the art, the physical memory address space of the computer can be conceptionally organized into multiple sections called memory pages wherein each memory page contains multiple cache lines. A memory page is defined by the processing system and is independent of the tag page.
In the preferred embodiment, the cache line address identifies 1) the tag page which contains the cache line address and 2) the location of the cache line address within the tag page. In particular, the high-order bits in the cache line address identify the tag page while the lower-order bits identify the location of the cache line address within the tag page.
Typically, the low-order bits are called indexes because the low-order bits identify the location of a cache line address within a tag page. For example, for the first cache line address in the first tag page, the high-order address bits identify the first tag page and the low-order address bits identify the first cache line address location within the first tag page.
In the preferred embodiment, the tag controller in a coherency filter direct maps the cache line addresses into a tag memory. For example, when a processor connected to a first bus initiates a bus transaction requesting a particular cache line address, the first tag controller evaluates the cache line address. The first tag controller uses the lower address bits as an index to identify a particular tag entry in the first tag memory. The first tag controller then stores the high-order bits (the tag page) in the identified tag entry.
In the preferred embodiment, the tag memories are implemented with static memory. The static memory implementation allows each tag controller to access each tag memory quickly during a bus transaction. While the present invention is implemented in static memory, a person of ordinary skill in the art, however, will recognize that different types of storage mechanisms may be used to implement the tag memories. Preferably, the different types of storage mechanisms will provide memory access speeds commensurate with the bus clock rates so as to optimize performance.
When two cache line addresses map to the same tag entry, the tag controller expels the previous cache line address to make room for the new cache line address. As explained above, this process can suspend the bus transaction associated with the new cache line address until the old cache line address has been invalidated. Furthermore, invalidating the old cache line address can require additional bus transactions to ensure that the tag memory maintains a superset of the cache line addresses existing in the cache units.
In one embodiment of the present invention, each coherency filter further contains an invalidation queue which holds the old cache line address and the new cache line address without suspending the bus transaction associated with the new cache line address. This can improve system performance because the invalidation bus transactions which invalidate the old cache line can occur at a later time.
Focusing now on maintaining cache coherency in multiple buses, the preferred coherency filters determine when a cross-bus transaction is required by monitoring the bus transactions on their assigned buses. In particular, the cycle encoders in each coherency filter monitors each bus transaction occurring on the coherency filter""s assigned bus. In the preferred embodiment, the cycle encoder uses well known bus monitoring logic which monitors the bus control lines. The cycle encoder then transmits 1) the type of bus transaction and 2) the cache status information in the tag memories which is associated with the bus transaction to the coherency rules table.
Focusing now on the rules table, the rules table determines when to perform cross-bus transactions to ensure cache coherency. In the preferred embodiment, the rules table determines whether to perform a cross-bus transaction based in part on the coherency status information in the tag memories. For example, if a bus read transaction on a first bus identifies a particular cache line address, the rules table assigned to the first bus evaluates the coherency status of the cache line address in tag memories (the remote tag memories) assigned to the other buses.
With the coherency status information from the remote tag memories, the rules table determines whether a remote bus transaction is necessary to ensure cache coherency. As discussed in more detail below, in a particular coherency filter, the tag controller accesses the remote tag memories and inputs the cache status into the rules table. In addition, the cycle encoder determines the type of bus transaction and inputs the bus transaction information into the rules table.
In the preferred embodiment, the rules table acts as a large truth table. Using the bus transaction information and the remote tag memory information, the rules table determines which cross-bus transaction or set of bus transactions are needed to maintain cache coherency.
For example, assume a processor initiates a read bus transaction on a first bus. In this example, the first bus which initiates the bus transaction is referred to as the local bus while the other buses in the multiple bus system are called the remote buses. The read bus transaction transmits the desired cache line address to the coherency filter assigned to the local bus (the local coherency filter). The local coherency filter then evaluates whether the cache line address exists in the tag memories assigned to the remote buses (the remote tag memories).
The remote tag memories in this example indicate that the coherency status of the desired cache line address is the invalid state. In such a situation, there is no need to perform a cross-bus transaction to maintain cache coherency because the cache line address in the remote buses is invalid. Therefore, the local coherency rules table limits the bus transaction to the local bus and the main memory, without generating bus transactions on the remote bus. Limiting the bus transaction to the local bus reduces cross-bus traffic.
If, however, the remote tag memories indicate that a cross-bus transaction is required, the rules table determines the appropriate cross-bus transaction or set of transactions needed to ensure cache coherency. For example, one of the remote tag memories may indicate that the cache line address is the owned state and thus, a cache unit connected to the remote bus might possibly have a modified version of the cache line. If the remote tag memories indicate that the cache line address is in the owned state, the local rules table indicates that bus master logic connected to the remote bus needs to perform a bus read command on the remote bus to ensure cache coherency.
When the bus read command executes on the remote bus, the remote cache units snoop the bus read command and determine whether they have a modified version of the desired cache line. If one of the remote cache units on the remote bus returns a modified version of the cache line, the rules table forwards the cache line to the requesting processor on the local bus.
However, if none of the cache units on the remote bus have modified the cache line, the cache units do not respond to the bus read command. The rules table then determines that the up-to-date cache line is in main memory. Accordingly, the present invention transmits the up-to-date cache line in main memory to the requesting processor on the local bus. Thus, the preferred embodiment of the present invention uses the superset of cache line addresses in the tag memories to determine when cross-bus transactions are needed to maintain cache coherency.
An additional aspect of the present invention includes a third bus which is dedicated to interfacing with input/output devices. In the preferred embodiment, this third bus is called the input/output (I/O) bus. The preferred I/O bus is the same type of bus as the other processor buses; however, one of ordinary skill in the art will recognize that the I/O bus and each of the other processor buses may use different bus protocols.
The preferred I/O bus operates in a similar manner as the other processor buses. Most I/O transfers in high-performance computers is done with direct memory access (DMA) transfers. The DMA transfers are usually initiated by I/O devices which move data directly between main memory and the I/O device without direct central processor involvement. Maintaining memory coherency on the I/O transactions which occur on the I/O bus avoid the flushing of cache lines in the cache units before and after each DMA transfer.
Another type of I/O transfer involves direct programmed access of I/O data by the processors. In the preferred implementation, the bus switch forwards the direct I/O transfers to the I/O bus and forwards all memory accesses, other than accesses to the main memory address space, to the I/O bus as memory-mapped I/O transfers. Such I/O transfers do not involve cache coherency, but, as discussed in more detail below are transmitted from one bus to the other bus in a unique manner.
The preferred I/O bus contains an I/O coherency filter and an I/O bus interface which improves I/O mapping across multiple system buses and improves I/O data processing and reduces system bus complexity. I/O data transactions which occur on the buses are automatically forwarded to the I/O bus. In addition, transactions which originate on the I/O bus are sent to the destination bus without broadcasting the bus transactions to the other buses.
A further aspect of the present invention optimizes communications between multiple buses. Conventional bus switches, for example, interconnect different buses with independent connection paths. Thus, in a conventional multiple-bus system, the first bus and second bus are usually interconnected with one independent connection path, the first bus and third bus are interconnected with another independent connection path while the second bus and third bus are interconnected with yet another independent connection path. As can be expected, such independent connection paths increase bus switch implementation complexity.
For example, when a first bus desires to direct a first bus transaction to a second bus, the first bus places the first bus transaction in a first queue which links the first bus with the second bus. The second bus then obtains the first bus transaction from the output of the first queue. Likewise when the second bus desires to direct a second bus transaction to the first bus, the second bus places the second bus transaction in a second queue which links the second bus with the first bus. The first bus then obtains the second bus transaction from the output of the second queue.
Therefore, two buses require two queues. When additional buses are interconnected more queues are required. For example, in a three-bus system each bus-to-bus connection requires two queues. Consequently, a three-bus system requires six queues.
The unique approach to implementing a bus switch in the preferred embodiment, however, reduces such system complexity with a multiported pool of memory cells which are accessible by all of the buses. With the unique bus switch, data can flow from any bus to any other bus without interfering with other data transfers which may be occurring at the same time. As discussed in more detail below, the bus transfers from each of the buses enter the common pool of memory cells. The bus transactions in the common pool of memory are then directed to their destination buses. Advantageously, any bus can read from or write to any other bus without using independent connection paths.
In the preferred embodiment, the information associated with each bus transaction is stored in three different memory cells called the data cells, the request cells and the address cells. The data cells store the data associated with a bus transaction. The request cells contain bus transaction information which defines the type of bus transaction sent to the destination bus. Finally, the address cells contain address information and coherency status information related to a bus transaction.
In the preferred embodiment, a one-to-one correspondence exists between each data cell, each request cell and each address cell. Thus each data cell, request cell or address cell, or any combination of these cells can contain the information for a particular bus transaction. While the preferred embodiment uses three memory cells to hold bus transaction information, the bus transaction information could exist in less or more than three memory cells.
Conceptionally, the data cells, request cells and address cells can be viewed as existing in a single pool of multiported memory. Although, in the preferred embodiment, the data cells, request cells and address cells are located in different components, they continue to maintain their one-to-one correspondence. In the preferred embodiment, a data interface buffer contains the data cells while a system access controller contains the address cells and request cells.
Focusing now on the preferred data interface buffer, each of the data cells in the data interface buffer is multiported and accessible by all of the buses. Each data cell contains the data associated with a particular bus transaction. Advantageously, the pool of data cells in the data interface buffer interconnects the bus data paths.
Focusing now on the preferred system access controller, the system controller contains a central request list, a buffer manager, a plurality of bus masters and a plurality of bus slaves. As is well known in the art, each bus master initiates bus transactions on one of the buses while each bus slave receives bus transactions initiated by other devices connected to one of the buses. The central request list maintains the pool of request cells and the buffer manager maintains the pool of address cells.
Each of the request cells in the central request list is multiported and accessible by all of the buses. Each request cell contains a target bus identifier, an action code which is also called the bus transaction code, and an owner bus identifier. The target bus identifier identifies a particular destination bus, the bus transaction code identifies a particular bus transaction and the owner bus identifier identifies the originating bus.
Focusing now on the pool of address cells in the buffer manager, each address cell is multiported and contains xe2x80x9cin-usexe2x80x9d information, a memory address and data cell status information. The in-use information in an address cell indicates whether an address cell is available for use. In the preferred embodiment, the in-use information comprises an in-use bit which is set to indicate whether an address cell is in use or free. In some cases, when an in-use bit is set to free, valid data may exist in the data cells. This allows optimizations which reuse the valid data in the free data cells.
The memory address, on the other hand, contains the memory address associated with a bus transaction while the data cell status indicates the status of the data in the data cells. In addition to the pool of address cells, the buffer manager also includes an address cell priority encoder, multiple first-in-first-out (FIFO) memories and multiple address comparators. The address cell priority encoder determines which address cells are in use and which address cells are free to receive new bus transaction information. In the preferred embodiment, the address cell priority encoder determines which address cells are free by evaluating the in-use information in each address cell.
The address cell priority encoder not only determines which address cells are free, but also assigns the free address cells to the different buses. Preferably, the address cell priority encoder assigns the free address cells to the different buses. After assigning the free address cells to the buses, the address priority encoder sets the in-use bit to indicate that the address cell is not free. For example, assume in a three bus system, the priority encoder determines that three address cells are free. The preferred priority encoder assigns the first free address cell to the first bus, the second free address cell to the second bus and the third free address cell to the third bus.
When a fourth address cell becomes free, the address cell priority encoder cycles back to the first bus and assigns the fourth address cell to the first bus. While the preferred address cell priority encoder uses such techniques to assign the free address cells to different buses, one of ordinary skill in the art will appreciate that the address cell priority encoder can employ a wide range of allocation schemes to assign the free address cells to the different buses.
Focusing now on the FIFO memories in the buffer manger, the FIFO memories temporarily store the assigned address cells until they are needed by the buses. In the preferred embodiment, the FIFO memories store address cell identifiers which identify the assigned address cells. An address cell identifier is a data variable which contains the memory location of an assigned address cell. The buses use an address cell identifier to access the address cell memory location identified by the address cell identifier.
In the preferred embodiment, each FIFO memory is assigned to a particular bus. Furthermore, each FIFO memory is coupled to one of the bus slaves and one of the coherency filters assigned to the same bus as each FIFO memory. When one of the bus slaves or one of the coherency filters desires to send a bus transaction to another bus, they obtain one of the address cell identifiers from their assigned FIFO memory.
For example, assume that a first processor on a first bus desires to send a data value to a second I/O device on a second bus. In this example, a first bus slave is connected to the first bus. When the first processor initiates a bus transaction which sends a data value to the second I/O device, the bus transaction is received by the first bus slave. The first bus slave then determines that the bus transaction needs to be forwarded to the second bus.
Accordingly, the first bus slave accesses the first FIFO memory in the buffer manager and obtains an address cell identifier. Using the address cell identifier, the first bus slave accesses the identified address cell stores the data value address and if necessary, the data value""s coherency status in the address cell. In the corresponding request cell, the first bus slave designates the second bus in the target bus identifier, the bus transaction code in the action code (also called the bus transaction code), and the first bus in the owner bus identifier. Furthermore, the first bus slave stores the data value associated with the bus transaction in the corresponding data cell.
In a different example, assume that a first cache coherency filter assigned to a first bus determines that a cache line access requires a bus transaction on a second bus to ensure cache coherency. In this example, the first cache coherency filter accesses the first FIFO memory in the buffer manager and obtains an address cell identifier.
The first coherency filter uses the address cell identifier to access the identified address cell. The first coherency filter then stores the cache line address and the coherency status information in the address cell. In addition, in the request cell, the first coherency filter designates the second bus in the target bus identifier, the appropriate bus transaction code and the first bus in the bus identifier. In this example, however, the corresponding data cell remains empty because the cache line data is not needed to ensure cache coherency. Once a bus transaction information has been added to the cells, the proper buses must obtain the bus transaction information and execute the desired bus transaction.
In the preferred embodiment, a plurality of bus priority encoders in the central request list are connected to the request cells. As explained above, the target bus identifier in the request cells identify the destination bus. Generally speaking, the bus priority encoders evaluate the target bus identifiers in the request cells to determine the which bus should perform the bus transaction.
For example, assume that the target bus identifiers in the request cells designate a first bus and a second bus. In this example, the first bus priority encoder evaluates the target bus identifiers in the request cells to identify which request cells are for the first bus while the second bus priority encoder evaluates the target bus identifiers to identify which request cells are for the second bus.
In addition to identifying the destination buses, each bus priority encoder also determines which of the bus request cells associated with a particular bus has the highest priority. In the preferred embodiment, each bus priority encoder determines the highest priority bus request cell using round robin techniques. The round robin techniques ensure that each bus priority encoder sequentially assigns the highest priority to the bus request cells.
Each bus priority encoder forwards the highest priority bus request cell to one of the bus masters. As explained above, in addition to having a bus identifier, the request cell also contains a bus transaction code. The bus master then performs the bus transaction identified in the request cell. In some cases, as explained in more detail below, the bus which executes the transaction may need to write data back to the bus which initiated the bus transaction. In such cases, the bus master will use the data cell to store the write-back data and will reuse the request cell to communicate with the originating bus. As explained above, the origination bus is identified by the owner bus identifier existing in the request cell. However, upon completion of the bus transactions, the bus master sets the address cell 500, the request cell 600 and the data cell 700 to free.
While the bus master performs the bus transaction, the bus priority encoder identifies the next highest priority request cell assigned to its bus and forwards the request cell to the bus master. When a bus priority encoder reaches the last bus request cell assigned to its bus, the bus priority encoder cycles back to the first bus request cell assigned to its bus. Assigning the highest priority to each bus request cell on a round robin basis ensures that every bus request cell will eventually be forwarded to the buses. As new request cells are added to the central request list, each of the bus priority encoders obtain immediate access to the new request cells and assign the highest priorities accordingly.
In another aspect of the present invention, the buffer manager contains a plurality of address comparators which identify address conflicts. Typically, address conflicts arise when two different bus transactions relate to the same data value and occur at about the same time. In such situations, it is possible that two bus transactions for the same data may simultaneously try to exist in the address cells, request cells and data cells. As can be expected, such address conflicts can lead to improper results.
In the preferred embodiment, a set of address comparators is assigned to each bus. Each set of address comparators is coupled with one of the coherency filters, one of the bus slaves and all of the address cells in the buffer manager. For each bus transaction, the set of address comparators assigned to that bus compares the bus transaction address with all of the addresses in the address cells. If an address conflict is detected, to ensure proper operation the appropriate actions must be taken as detailed below.