This invention relates in general to the field of computer system bus architecture, and more particularly to is an apparatus that allows atomic read-modify-write operations to be executed over a pipelined, split-transaction on-chip data bus.
Computer systems have historically consisted of a number of distinct components such as a central processing unit (CPU), a memory, and input/output (I/O) logic. The CPU performs all of the computational functions, the memory stores program instructions and data that direct the CPU to perform specific functions, and the I/O logic provides an interface to devices such as video monitors, keyboards, and storage devices. The CPU must constantly transfer data to/from the memory to retrieve program instructions and to store results of computations. The CPU must also communicate with the I/O logic to retrieve commands and to display results. In many systems today, the I/O logic directly retrieves large blocks of data from the memory to allow video monitors to be refreshed without burdening the CPU.
To standardize interface signals, interface logic, and the communication protocol between devices in a computer system, it is standard practice to interconnect all devices via a bused architecture rather than providing point-to-point connections between devices. In a bused architecture, a common set of communication signals-a system bus consisting of address signals and data signals are connected in parallel to all devices. The address signals on the system bus identify a device that is the target of a data transfer. The data signals on the system bus are used to transfer the data itself to the target device.
And because devices in the system configuration are connected in parallel to the system bus, it follows that only one instance of a data transfer can occur at any given point in time. If two devices execute a data transfer at the same time, then signals on the bus are corrupted, thus precluding any transfer of data at all. As a result, system designers provide logic elements in a system configuration whose purpose is to arbitrate access to the bus so that contention is avoided and devices are provided fair and timely access to the bus.
In early years, computer system buses were simple: eight bits in width, the CPU was the only device that was capable of initiating a data transfer, and the number of other devices connected to the bus consisted primarily of memory and I/O logic. Accordingly, access to the bus was easily managed. If the CPU required a byte of data from the memory, it grabbed the bus, issued the address of the data byte to the memory, and the memory supplied the byte of data to the CPU. Operation of the bus was efficient because the bus was always available for use by the CPU.
But in more recent years, a host of associated technological advances have completely changed the bus architecture. Because a digital computer can be used to control a wide range of automated processes, whole industries have migrated toward the incorporation of computers into their products. Today we see computer systems in telecommunication devices, televisions, home appliances, automobiles, industrial process controllers, musical instruments, games, and vending machines, not to mention aircraft, spacecraft, weapons systems, and data network servers. It could be said that it is the demand for faster, more precise, more application-specific, more robust computer systems that is driving the computer industry toward further advances instead of advances in the industry identifying opportunities for application of computer devices. We are experiencing an era where demands are pulling enabling technologies along.
Today, there are literally thousands of different devices that can be connected to a computer bus. Today""s data buses are no longer 8 bits wide; 64-bit buses are more commonly found with new devices coming to the field having 128-bit buses, or wider. And today""s systems no longer have only one device that is capable of initiating a transaction over the data bus. A high-end performance computer may have a CPU that is dedicated to performing general purpose computations, a graphics processor that performs video-intensive computations, and a digital signal processor (DSP) that performs intensive audio signal manipulations. The high-end system may also have a communications processor that is dedicated to interacting with other computers over a network. All of these processors must communicate over a system bus to memory, to I/O logic, to each other, and to innumerable other kinds of special-purpose devices. In fact, it is not uncommon today to find four or more CPUs in a system configuration, each of which is capable of initiating data transfer operations over the system bus. Because of these advances in the art, system designers have been forced to provide more sophisticated techniques and algorithms that enable a system bus to be used more efficiently.
One such technique is known as transaction pipelining. Simply put, rather than executing a first address transaction over the address signals and a first data transaction over the data signals to accomplish a first data transfer operation, then following this with a second address transaction over the address signals and a second data transaction over the data signals to accomplish a second data transfer operation, pipelining allows transactions over both sets of signals to occur simultaneously, very much like an assembly line process. In the case stated above, pipelining allows the second address transaction to occur over the address signals concurrent with execution of the first data transaction. Obviously, rather than arbitrating access to the bus at a system level, devices that support pipelined bus arbitrate access at the address and data signal level distinctly. A device can be granted access to the address signals while a different device is granted access to the data signals.
A second technique that has been developed to improve the efficiency of data transfers over a system bus is transaction splitting. A transaction is typically defined as a read or a write, to or from, memory or I/O. The transaction begins with an address phase that defines the type of transaction, and the address of the data, and concludes with a data phase where the data associated with the address is presented to/from the requester. In a split transaction system, the address and data phases of a transaction are split. That is, the address phase of a transaction is decoupled from the data phase of a transaction. This allows the address bus to be utilized for subsequent transactions, even though the data bus is still completing an initial transaction.
Furthermore, a split-transaction bus allows data transactions to occur out-of-order. What this means is that devices that are capable of providing data more rapidly can access the data bus-which would otherwise be idle-ahead of slower devices, even though address transactions to the slower devices preceded address transactions to the faster devices (presuming some form of transaction tracking is provided). The addition of transaction splitting to a pipelined architecture significantly improves the usage efficiency of a system bus.
Both pipelining and transaction splitting are essential features of a present day system bus where the demand for access is heavy and the amount of data transferred is large. Yet in spite of the necessity of these features, there is one class of data transfer operations that cannot be efficiently executed in a pipelined, split-transaction environment: read-modify-write operations.
A read-modify-write operation, generally speaking, is a series of two dependent data transfers to the same location: a first data transfer wherein the contents of the location are read by a requesting device and a second data transfer wherein the requesting device writes new data to the location. Read-modify-writes are commonly employed in a system configuration that provides shared resources to multiple CPUs. Most often, the availability of a resource is indicated by the state of a location, say, an address in memory. If the contents of the addressed location are, say all zeros, then the resource is not being used. If the contents are all ones, then the resource is in use. Hence, to acquire and use the resource, a given CPU will read the address. The CPU has acquired the resource if it just read in all zeroes. If it read in ones, then it has to wait until the current owner relinquishes control by writing zeroes. Irrespective of whether the CPU read zeroes or ones, it writes back ones. Note that if the CPU became the owner, this is the correction modification to the location. If the CPU is not the owner, then this write has no impact.
But to properly perform a read-modify-write operation, the memory location must be protected from inadvertent accesses by other CPUs during the interim between the read by the given CPU and the ensuing write. Otherwise, another CPU may be allowed to think that it can also obtain the resource.
But present day bus architectures only provide work-around approaches to enable read-modify-writes to be accomplished. One approach, bus locking, suspends pipelining and transaction-splitting features altogether during a read-modify-write while another approach, address reservation, requires that the requesting device be responsible for ensuring that the write portion of the read-modify-write operation is performed, when in fact the requesting device cannot prevent any other device from writing to the address in the interim; the requesting device is at the mercy of the bus and may very well experience problems because of the unpredictable latency of the write.
Therefore, what is needed is an apparatus for performing a read-modify-write operation that preserves both pipelining and split-transaction features of a system bus during execution of the operation.
In addition, what is needed is a system bus apparatus that enables read-modify-write operations to be executed with certainty within a pipelined, split-transaction environment.
To address the above-detailed deficiencies, it is an object of the present invention to provide an apparatus for performing a read-modify-write operation over a system bus that does not require suspension of either pipelining or transaction splitting.
Accordingly, in the attainment of the aforementioned object, it is a feature of the present invention to provide an apparatus for controlling a read-modify-write transaction to an address in a bus slave device. The apparatus includes transaction control logic and transaction response logic. The transaction control logic provides a write barrier command from a bus master device over an on-chip system bus to the bus slave device. The transaction response logic is coupled to the transaction control logic. The transaction response logic receives the write barrier command and precludes execution of any other transaction to the address within the bus slave device until completion of the read-modify-write transaction while simultaneously allowing execution of other transactions to other addresses within the bus slave device.
An advantage of the present invention is that other transactions can occur over a system bus during the interim between the read portion and the write portion of a read-modify-write operation.
Another object of the present invention is to provide a system bus apparatus that can perform a write portion of a read-modify-write operation without being required to ensure that other devices have not perturbed the target address of the write in the interim.
In another aspect, it is a feature of the present invention to provide a computer system bus apparatus for executing a read-modify-write transaction to an address within a bus slave device. The computer system bus apparatus has a bus master device and write barrier logic. The bus master device requests the read-modify-write transaction. The bus master device includes an arbitration signal generator and command generation logic. The arbitration signal generator indicates to an address bus arbiter an intent to perform the read-modify-write operation. The command generation logic is coupled to the arbitration signal generator. The command generation logic issues, over an address bus, a read command to the address followed immediately by a write barrier command to the address. The write barrier logic is coupled to the bus master device. The write barrier logic receives the read command and the write barrier command, and prevents reads/writes from/to the address until data corresponding to the read-modify-write transaction is written to the address while simultaneously allowing reads/writes from/to other addresses within the bus slave device.
Another advantage of the present invention is that a system bus is not encumbered with unnecessary retries to accomplish a write portion of a read-modify-write.
In a further aspect, it is a feature of the present invention to provide a computer program product for use in designing, simulating, fabricating, or testing an integrated circuit device. The computer program product includes a storage medium. The storage medium has computer readable instructions embodied thereon, for causing a computer upon which the computer readable instructions are executed to describe the integrated circuit device such that it can be modified, simulated, fabricated, or tested. The computer readable instructions include first instructions and second instructions. The first instructions cause the computer to describe transaction control logic, where the transaction control logic provides a write barrier command from a bus master device over an address bus to a bus slave device. The second instructions cause the computer to describe transaction response logic, where, upon receipt of the write barrier command, the transaction response logic precludes reads or writes to an address within the bus slave device until data, provided over a separate data bus, corresponding to the write barrier command is written into the address, and where the transaction response logic simultaneously allows the execution of reads/writes to other addresses within the bus slave device.
A further advantage of the present invention is that its utilization improves the efficiency at which a system bus transfers data between devices.