It is known that symmetrical multiprocessor (interconnected by a system bus) architectures are used in modern data processing systems to achieve ever higher performance levels, where the accessible or system memory space has increased enormously.
For example, the modern POWER PC 604/620 type microprocessors, using 48-bit addresses, enable addressing of a system space of 2.sup.48 bytes.
The space is termed system space because it comprises both a memory space and also spaces for I/O peripherals and for registers.
It is also known that the number of processors, memories and I/O devices that can be connected to a system bus is limited, and that tree structures have been developed in order to overcome these limits.
Only a few, fast units, essentially microprocessors, are connected directly to the system bus, while I/O devices are connected to local buses and the latter to the system bus by way of logic units that act as a collector and interface bridge, appropriately called PCI bridges.
PCI bridges have a variety of functions. They arbitrate access to the local bus by units connected thereto, they distinguish which of the requesting local units require access to the system bus, latch their requests in suitable buffers, and take their place in access to the system bus. As the local buses and system bus may, and in general do, have different timing and different communication protocols, the PCI bridges interact with the two types of bus in compliance with the timing and respective protocols of each.
A problem that these bridges have to solve is that of permitting the local units to access the entire system space through the system bus.
The problem arises because local buses implement communication protocols wherein addresses of only 32 bits are used, conveyed on an address bus having only 32 wires.
Hence the space that is addressable by local units through the local bus is of only 4 GB.
When it is recalled that local units are assigned a peripheral space visible to processors through the system bus, within the same range of addresses it will be obvious that the outer space visible to the peripheral units is less than 4 GB.
The state of the art solutions to this problem are entirely unsatisfactory.
In a first solution, arrangements have been made for the software to execute move operations.
To write data into a memory space that is not directly accessible, a local unit can first transfer the data to a DMA (Direct Memory Access) buffer, located in a directly accessible memory space (i.e. in a system memory space addressable with only 32 bits), and then transfer the data to a space beyond the 4 GB with a software move command, performed by a logic for controlling direct access to memory that is necessarily interfaced with the system bus.
Similarly for reading, the data must first be transferred with a move operation, from a not directly accessible memory space, into the DMA buffer, located in an accessible space, where they can be read.
It will be obvious that this solution is not very effective and restricts performance. It is generally unacceptable for very frequently used local units, such as disk units and the like.
In a second solution, the 32-bit address is translated into an address with more bits, for example by concatenation of the32 bits with an address range contained in a register of the bridge.
This is obviously an inflexible solution because it obliges the various local units connected to the local bus to use the same 4 GB range of system space, even though the contents of the register may periodically be changed.
Furthermore, unless a signal consisting of one of the 32 address bits can be provided in each addressing operation to qualify the address as a direct access address or as an address to be concatenated, this second solution is incompatible with the first.
Greater addressing flexibility has been recently obtained by the definition of a standard reference architecture called CHRP (Common Hardware Reference Platform) which enables the interconnection through local buses and a system bus of units supplied by different manufacturers working to this standard.
The CHRP architecture proposes a third solution and defines a mechanism for translating direct memory access addresses in the 32-bit address space of a local bus such as the PCI bus into the 64-bit addresses of a system bus.
For each bridge of the system (there may be several bridges in the system and, more specifically, dual or twin-bridge PCI bridge units are available for interconnecting a system bus and two local buses) provision is made of a table allocated in memory and called TCE (Translation Control Entry).
The bridge unit comprises a register TCE ADDR REG (one for each bridge with a local bus), the contents of which determine the memory allocation base or starting address and the size of the TCE table.
Each entry in the TCE table describes a page of memory (4 KB) in the 32-bit address space, associating a translated address with the page for addressing through the system bus a system space defined by 64-bit addresses.
Accordingly, through the contents of the TCE ADDR REG register and the 20 most significant bits of a 32-bit address received from the local bus (the 12 least significant bits constitute a page OFFSET), the bridge unit can point to a TCE table entry, read the address identifying the entry requested in the system space and access the entry with this address.
Clearly, in order to avoid having to perform two system space access operations each time, the first to read the TCE entry and the second to access the entry desired, the bridge unit is preferably provided with an address cache that directly associates the corresponding 64-bit address to be used in accessing the system bus with the local bus 32-bit address.
The CHRP architecture specification requires that the bridges be provided with a programmable bit (and associated supporting bistable cell), called the global bit, that enables/disables the address translating mechanism based on the TCE table for the whole local bus.
Accordingly, the system space can be accessed from a local bus with or without translation, with system space visibility being restricted in the latter case to the first 4 GBs only.
Therefore, this third solution, though at least partly overcoming the limitations of the first two, is still unsatisfactory because all the local units connected to a local bus must use, or not use, the mechanism.
The system is therefore still inflexible.