This invention relates in general to the field of computer architecture, and more specifically to a data coherency mechanism for an on-chip split transaction system bus.
A system bus in a computing system provides a communication channel between computing devices, such as microprocessors, graphics processors, direct-memory-access (DMA) controllers, and other devices such as memory, keyboard, monitor, video controllers, sound generation devices, etc. The system bus typically includes data paths for memory addresses, data, and control information. In some instances, a processor multiplexes (i.e., shares) address and data information over the same signal lines, albeit at different times. That is, a processor sends address information out over the address/data pins during a first time period and later uses the same address/data pins to send or receive data. Alternatively, many processors utilize separate signal lines for address and data information.
In operation, processors communicate with memory when they need to fetch instructions. During execution of instructions, processors might be required to read data from memory, or from another device such as an input/output (I/O) port. And, upon completion of instructions, processors might be required to write data to memory, or to another device. A typical scenario for accessing memory to obtain instructions and data is similar to the following:
1. A processor presents a memory address for an instruction on address lines of a system bus, and provides control information on control lines of the system bus to indicate that the operation is a read.
2. In response to the address and control information being placed on the system bus, memory places an instruction on data lines of the system bus, which are then read by the processor. The data is typically placed on the data lines N cycles after the address information has been placed on the address lines, where N is a positive integer and varies depending on the speed of the memory.
3. During execution of the instruction, if data is required, a memory address for the data is placed on the address lines of the system bus, and control information is placed on the control lines of the system bus to indicate a read.
4. Again, the memory places data corresponding to the memory address on the data lines of the system bus.
5. If the instruction needs to write to memory, the memory address for the write is placed on the address lines of the system bus, and control information is placed on the control lines to indicate a write.
6. N cycles after the memory address is presented, the data to be written is placed by the microprocessor on the data lines of the system bus. The memory uses the memory address presented in step 5, and places the data on the data lines into memory at that address.
One skilled in the art will appreciate from the above that the system bus provides the necessary physical interface between a processing device, and other devices (such as memory) that are external to it. A system bus also provides the protocol necessary for communicating between devices. That is, the protocol defines when address, data, and control signals must appear on the system bus, in relation to each other. For example, in the illustration presented above, address information appears in parallel with control information. At some time later, data information is presented by the processor, or is provided by memory.
In environments where there is only one device capable of initiating bus activity (a uni-master environment), the above described sequence is generally sufficient. However, in environments where multiple processors compete for access to shared devices, arbitration is needed to assign time on the bus to the multiple processors.
For example, if there are two processors on a system bus, both competing for access to slave devices (such as memory), typical systems provide an arbitration protocol between the devices to establish which one has the right to begin. On the Pentium bus (designed by Intel Corporation), a processor requests access to the bus by asserting a xe2x80x9cbus requestxe2x80x9d signal. If the processor receives a xe2x80x9cgrantxe2x80x9d signal, either from another processor, or from an external arbitration device, then it begins a transaction by placing address and control information on the bus. When it receives (or writes) data on the bus, it relinquishes control of the bus to the next processor. If another processor required access to the bus during the transaction, it would have to wait until the entire transaction (including the address and data portions of the transaction) completed. In most situations, it is undesirable to deny a processor access to a bus pending completion of an entire transaction by another processor.
One solution to this problem has been to separate the address and data bus portions of the system bus, and to provide separate arbitration for gaining access to each of the buses. For example, rather than requesting access (or master) of the system bus, a first processor may request access to the address bus. If the address bus is available, the first processor can present address information on the address lines, even though a second processor is bus master of the data bus. Access to the data bus by the first processor operates in a similar fashion.
Thus, by separating arbitration for accessing the address bus from that of the data bus, multiple masters are allowed to utilize portions of the system bus simultaneously. An example of an environment that provides for such split address and data buses is the system bus for the PowerPC 603, manufactured by Motorola.
When the address and data portions of a bus are separate, and are shared by multiple bus masters, a system is required to allow master devices to request, and gain access to the address and data buses, independently. This is typically provided via an arbiter, and an arbitration protocol.
The arbiter is coupled to each device on the bus that can act as a master device. A master that wishes to access either the address or data portions of the system bus presents a bus request (address bus request, or data bus request) to the arbiter. The arbiter, upon receipt of a request, utilizes its predefined protocol to determine when to grant the master access to either of the address or data bus. When it determines that the requesting master can access the address bus or the data bus, it provides that master with a bus grant signal (pertaining to the requested bus). Upon receipt of the grant signal, the requesting master begins driving the bus (address or data).
In multi-master environments, there are typically a number of locations where data may be stored. For example, it is common for a memory, or cache, to be placed within modern microprocessors to allow them to quickly access data or instructions without requiring that the processors access the memory on the system bus. Although the size of the cache is usually small compared to the memory on the system bus, every time a processor reads or writes data to a memory location that is already in its cache, activity on the system bus is reduced, or temporarily eliminated. An example of this is provided below using the split transaction bus described above.
1. If the data at address 1FFFH (where xe2x80x9cHxe2x80x9d stands for hexadecimal) is not in the cache, the master requests access to the address bus.
2. The arbiter grants the master access to the address bus.
3. The master asserts a read command, and places the address 1FFFH on the address bus.
4. When the memory controller is ready to respond to the read, it requests access to the data bus.
5. The arbiter grants the memory controller access to the data bus.
6. The memory controller places the data at address 1FFFH on the data bus.
7. The master receives the data.
If the data at memory address 1FFFH were already in the processor""s cache, none of the above steps would have been necessary. Rather, the processor would have simply retrieved the data from its cache, without initiating any activity on the system bus. Thus, the use of caches by processors improve system performance, both by providing data/instructions immediately to requesting processors, and by reducing system bus activity.
However, when processors (or other devices) provide alternate locations where data/instructions may be stored, some coherency mechanism must exist to insure that the data/instructions in the alternate locations is either the same as the data/instructions in the main memory, or that all of the devices that access the main memory always obtain the latest or best copy of the data/instructions.
For example, suppose a processor reads data at memory address 2FFFH from the main memory and places a copy of this data in its cache. Future accesses by this processor to address 2FFFH may then be provided by the cache without requiring it to access the system bus. If a second processor requests data at memory address 2FFFH, it is acceptable to provide the data from the main memory, since no change has been made to the data. If however, prior to the second processor making the request, the first processor changes the data at address 2FFFH, the system must insure that the second processor is provided the new or changed data. The methodology that insures that all devices that access memory provide or obtain the latest or best data is known as xe2x80x9ccoherencyxe2x80x9d. Although a complete description of coherency methodologies are beyond the scope of this application, an overview of a few well known methodologies are considered appropriate.
An early methodology that was developed required any processor making a change to data to xe2x80x9cwrite-thruxe2x80x9d the data all the way to main memory. That is, data could be copied into multiple caches on the system bus, and could be shared by multiple processors as long as the data remained unchanged. However, if any processor modified the data, it was required to write the data thru its own cache, and back out to main memory. Other processors on the system bus would xe2x80x9csnoopxe2x80x9d the write (by continuously examining the address bus), and would tag their copies of the data as xe2x80x9cinvalidxe2x80x9d. Subsequent accesses to the modified area of memory would require those processors to go back out to main memory to retrieve the latest data.
Since writes, or data modifications often occur in a processing environment, an improvement was made that required processors to xe2x80x9cwrite-backxe2x80x9d the modified data to the main memory, only if another device requested the modified data. For example, a processor might change data at address 3FFFH, and place the modified data in its cache. This modified data, rather than being written immediately to main memory, would be held in the cache and tagged as modified. The processor would continuously monitor the address/control lines of the system bus to determine whether another device requested data at 3FFFH. If so, the processor would cause the request to be held until it could write the modified data back into the main memory. Thus, until other processors requested the xe2x80x9cmodifiedxe2x80x9d data, no activity was required on the system bus.
A number of different coherency systems have been developed which ensure data consistency. Such consistent systems track the state of data in caches depending on whether the data has been xe2x80x9cmodifiedxe2x80x9d (or is xe2x80x9cdirtyxe2x80x9d), and whether the data is xe2x80x9csharedxe2x80x9d by more than one device. However, the systems that have been developed thus far were designed for system buses that are off-chip. But as more devices are integrated on-chip, it is becoming increasingly important to develop coherency protocols that work in on-chip buses as well. Having on-chip coherency simplifies the task of software programmers that write embedded applications, since the hardware insures coherency between multiple instances of data/instructions.
Unlike off-chip coherency, on-chip coherency protocols can implement poin-to-point controls that can simplify the protocol and enable more efficient implementations. Another significant difference is that unlike off-chip buses that must maintain coherence between heavy-weight I/O devices that connect to the system bus, an on-chip bus can time the protocol for faster devices.
Furthermore, what is needed is an on-chip system bus that insures coherency for multiple instances of data where the system bus is implemented within a split transaction environment.
The present invention provides an on-chip system bus having a plurality of data master devices that perform data transfers with memory. The master devices include a bus interface and a cache coherency system. The bus interface allows its master device to communicate with the on-chip system bus. The cache coherency system maintains coherency between a cache and the memory. The cache coherency system includes a coherency credit counter to count pending coherent operations on the bus. The coherency system also includes a coherency input buffer that is designated to hold coherent transactions. The bus interface communicates with a memory controller that includes coherency buffer management that manages coherent transactions initiated by the master devices.
In another aspect, the present invention provides a processing device configured to access an on-chip bus to perform a coherent data transfer. The processing device includes a bus interface and a cache coherency system. The bus interface couples the processing device to the on-chip bus. The cache coherency system is coupled to the bus interface, and determines whether the coherent data transfer can begin on the on-chip bus. The coherent data transfer is delayed until coherent transaction buffer space external to the processing devices is available. The processing device further includes split transaction tracking and control to establish a transaction ID for the coherent data transfer, the transfer having split address and data portions.
In yet another aspect, the present invention provides a multi-master split-transaction on-chip system bus for interfacing a number of master devices to a main memory, wherein each of the master devices have a bus interface. The master devices include a cache coherency system and split transaction tracking and control. The cache coherency system includes a cache to temporarily store data retrieved from the main memory. The coherency system insures that its master device does not operate on invalid data, and monitors a number of coherent data transactions. The split transaction tracking and control establishes transaction ID""s for each of the number of coherent data transactions, where each of the transactions have split address and data portions.