1. Field of the Invention
This invention relates to a bridge method and a bus bridge for bridging a plurality of buses, and a multiprocessor system utilizing the bridge method and bus bridge.
2. Description of the Related Art
A close-coupled system, which is a system of a plurality of processors connected to a single bus, is a typical example of a multiprocessor system capable of executing in parallel a plurality of processors. However, such a system has a shortcoming in that the number of processors in the system cannot be increased beyond a certain physical upper limit because of constraints on the number that can be connected to a single bus due to the load capacity of bus signal lines. In comparison, a multiprocessor system described in Japanese Laid-Open Patent Publication No. Hei 9-128346 has a configuration in which a plurality of buses are mutually connected with bus bridges, the advantage being that although the number of processors that can be connected to each bus can be limited to within the above-mentioned physical upper limit, the overall system can operate more processors in parallel than the above-mentioned physical upper limit. Namely, the system relating to the above-mentioned publication can include more participating processors, than the close-coupled system.
However, when a device is to receive data from another device in the multiprocessor system described in the above-mentioned publication, the device that is to receive the data may be made to wait temporarily, thus resulting in a delay in processing so as to become an impediment in improving system performance.
An example can be considered where a processor connected to a first bus is to read data from an arbitrary address in a memory connected to a second bus. In response to receiving a data read request from the reading processor via the first bus, a bus bridge connecting the first bus to the second bus transmits a data read request to the second bus. This bus bridge receives data from the memory via the second bus and transmits the data to the first bus. Thus, after the data read request is transmitted to the first bus and until the data is received, the reading processor cannot execute any processing regarding the data. Cache memories usually found at each processor and bus bridge are useful to shorten this idle time, e.g. the use of the bridge cache (called the cache memory for the bus bridge) together with an appropriate bus snoop technique can eliminate the process for requesting data on the memory from the bus bridge via the second bus. However, when executing a program in which the cache hit rate for the bridge cache is low, namely, a program that often requests data not found in the bridge cache, the drop in system performance due to the above-mentioned delay becomes particularly noticeable.
One object of the present invention is to suppress delays in processing as well as the resulting drop in system performance caused by the idle waiting of a device, such as a processor, that has requested data.
A first aspect of the present invention is a bridge method having a predictive cache process and a response process. In the predictive cache process, on the basis of the contents of a request signal predicted to be issued in the future from a device connected to a first bus, a request signal (an anticipatory request signal) is transmitted onto a second bus to which a device or devices are connected, to request data. Namely, according to a prediction where the device connected to the first bus will issue a signal for requesting data held by any one of the devices connected to the second bus, in the predictive cache process, an anticipatory request signal is issued and transmitted onto the second bus to which the various types of devices including the device or devices holding the requested data are connected, and thus the data is cached, from any one of devices holding the data and connected to the second bus, into a bridge cache.
In the response process, when the data requested by a request signal actually issued from the device connected to the first bus is found in the bridge cache, the data is sent from the bridge cache to the device that issued the request signal.
One case where the data concerning the request signal issued from a device connected to the first bus has already been cached into the bridge cache is a case where the data is already cached into the bridge cache by execution of the predictive cache process. Therefore, according to this aspect, the frequency at which the data can be sent immediately to the device originating the request signal increases and at the same time the frequency at which the device originating the request signal is forced to wait decreases, so that processing delays decrease and system performance improves. Also, since wait instructions are furnished at a lower frequency to the device originating the request signal, the load on the first bus decreases.
A second aspect of the present invention is a bus bridge comprising a bridge cache, a request prediction unit, a cache hit judgment unit, a response generator, and a request issuer. The bridge cache is a cache memory for caching the data held in the devices connected to the first or second bus. The request prediction unit, the cache hit judgment unit, and the request issuer provide functions relating to the predictive cache process in the first aspect. The cache hit judgment unit and the request issuer provide functions relating to the response process in the first aspect.
First, the request prediction unit predicts the contents of the request signal to be issued in the future from a device connected to the first bus and issues a prediction signal indicating the contents of the predicted request signal. When the prediction signal is issued from the request prediction unit, the cache hit judgment unit judges whether or not the data requested by the prediction signal is found in the bridge cache. With regard to the prediction signal issued from the request prediction unit, when it was judged that the data requested by the prediction signal is not found in the bridge cache, the request issuer issues a request signal for requesting the data to the device or devices connected to the second bus. Therefore, if there is a device (or devices) responding to the request signal with the requested data, data predicted to be requested in the future from the device connected to the first bus is cached into the bridge cache in advance of any actual request.
When a request signal is actually issued from the device connected to the first bus, the cache judgment unit judges whether or not the data requested by the request signal is found in the bridge cache. If found, the bus bridge can respond with the data to the device originating the request. Conversely, for the request signal actually issued from the device connected to the first bus, when it is judged the data requested by the request signal is not found in the bridge cache, the response generator on one hand issues a response signal to the device that issued the request signal to instruct the device to wait for a subsequent transmission of that data, while the request issuer on the other hand issues a request signal for requesting that data to the device or devices connected to the second bus. Namely, the device originating the request is made to temporarily wait, during which time the requested data is cached into the bridge cache.
Therefore, with regard to the request signal actually issued from the device connected to the first bus and judged as the requested data is not found in the bridge cache, the frequency at which the device originating the request is forced to temporarily wait is lower than the related art as a result in this aspect. This is realized by the provision of the request prediction unit and the inclusion of the prediction signal, in addition to the request signal that was actually issued for processing, by the cache hit judgment unit and the request issuer. As a result, processing delays caused by a device originating a request being forced to wait and the resulting drop in system performance are less likely to occur. Furthermore, since the frequency for issuing response signals to instruct a device originating a request signal to temporarily wait for a subsequent transmission of the data lowers, the load on the first bus can be reduced.
A third aspect of the present invention is a multiprocessor system comprising a plurality of buses to which a single device or a plurality of devices are connected, and a single bus bridge or a plurality of bus bridges for connecting these buses together. Furthermore, a plurality of processors are connected to the first bus among a plurality of buses, and memory is connected to the second bus, which is connected to the first bus via a bus bridge. The bus bridge according to this aspect utilizes the bus bridge concerning the second aspect of the present invention to bridging access to the memory on the second bus by the processor on the first bus. Therefore, according to this aspect, the advantage concerning the second aspect can be obtained in a system where the memory (such as main memory) is connected to the second bus and the processor (such as a local CPU) is connected to the first bus. Namely, a system can be realized in which a performance drop in the overall system due to processing delays of the processor due to memory accesses is less likely to occur, and in which the load on each bus is relatively low.
In this aspect, for example, a cache block size may be added to the value of an address signal included in the request signal actually issued from a device (such as a processor) connected to the first bus so as to determine a value of the address signal to be included in the prediction signal. This process can be realized simply by providing an adder in the request prediction unit in the second and third aspects. Namely, a relatively simple configuration is sufficient for the request prediction unit.
In embodying the present invention, it is preferable to execute a predictive request inhibiting process for inhibiting the issuance of request signals from the predictive cache process when a predetermined predictive request inhibition condition is satisfied. This enables an adverse effect to be prevented from occurring with the issuance of request signals based on the predicted result. For example, signal transmissions on the second bus, namely, the load on the second bus, can be reduced.
There are several types of predictive request inhibiting processes. A first type is applicable to preventing page boundary overflow, a second type is related to registering predictive access inhibited addresses, a third type is related to inhibiting access to uncacheable spaces, a fourth type prevents data invalidation, and a fifth type uses a result of monitoring the bus load. These types can be selected or combined arbitrarily.
In implementing the first type, a request prediction unit having an adder and a gate is provided in the bus bridge. The adder adds a cache block size to the value of an address signal included in the request signal actually issued from a device (such as a processor) connected to the first bus so as to determine a value of the address signal to be included in the prediction signal. The adder further sets an OVF flag to indicate an overflow if, as a result of adding the cache block size, a carry occurs to a position exceeding the page size. In the preceding case, when the OVF flag has been set, the issuance of the prediction signal is inhibited. Therefore, the issuance of the prediction signal is inhibited in this type when the address signal obtained from the adder and the address signal in the request signal that was actually issued (request signal that is to be the basis of prediction) point to addresses belonging to different pages. As a result, it becomes less likely that the address, having a low probability of being consecutively accessed by a single device, such as a processor, will be used as an address signal in the prediction signal, thus, the load on the second bus is reduced. Further, since it is sufficient simply to use an adder having a function for setting the OVF flag, a prediction signal issue inhibiting process for page boundary overflows can be realized with a relatively simple circuit configuration.
In implementing the second type, a request prediction unit having an adder, an address table, and a gate is provided in the bus bridge. The adder adds the cache block size to the value of the address signal included in the request signal actually issued from a device (such as a processor) connected to the first bus so that the addition determines a value of the address signal to be included in the prediction signal. The address table issues a predictive request disable signal when the value of the address signal obtained from the adder points to a predetermined address. The gate inhibits the issuance of the prediction signal in response to the predictive request disable signal. Therefore, in this type, registering into the address table an address for which inhibition of the issuance of a prediction signal including that address is necessary enables a prediction signal including that address to be issued. This is useful, for example, in preventing a prediction signal from being issued for an address, which may have its contents changed by an access itself.
In implementing the third type, a request prediction unit having an adder, a cacheable/uncacheable discriminator, and a gate is provided in the bus bridge. The adder has the same function as that in the second type. The cacheable/uncacheable discriminator determines whether or not a type signal, included in the request signal actually issued from a device connected to the first bus, includes a flag indicating an access to an uncacheable space. The gate inhibits the issuance of the prediction signal when it is determined the flag is included. This makes it possible to prevent an uncacheable space from being accessed on the basis of the predicted result.
In implementing the fourth type, a request prediction unit having an adder, a type discriminator, and a gate is provided in the bus bridge. The adder has the same function as that of the second type. The type discriminator determines whether or not the type signal included in the request signal actually issued from a device connected to the first bus indicates data read. The gate inhibits the issuance of the prediction signal when it is determined that data read is not indicated. Therefore, it is possible, for example, to prevent an adverse effect of invalidating the data required by a device on the basis of a predicted result, such as when a request signal concerns cache invalidation.
In implementing the fifth type, a request prediction unit having a gate, and a load monitor are provided in the bus bridge. The load monitor monitors the load on the first bus and/or the second bus and sets a high-load flag if a detected load exceeds a predetermined value. The gate inhibits the issuance of the prediction signal when the high-load flag has been set. Therefore, it is possible to prevent the load on the second bus from further increasing due to the request signal issued on the basis of the predicted result, and indirectly the load on the first bus from further increasing.
Furthermore, the load monitor in the fifth type can be realized with an incremental counter, a comparator, and a selector. The incremental counter counts the number of occurrences of effective validity signals included in the request signals on the bus being monitored. The comparator compares the counted result and a criterion. When it is determined as a result of the comparison that the counted result exceeds a predetermined value, the selector thereafter sets a high-load flag within a predetermined period. This makes it possible for the load monitor, which is a means for monitoring the load, to have a relatively simple configuration. Further, the load monitor in the fifth type preferably includes a criterion setting unit for varying the criterion. Providing the criterion setting unit enables the operation of the load monitor to be adjusted according to the performance required of the system.
In embodying the present invention, it is preferable to devise various schemes for improving the prediction accuracy. For example, a request queue, a subtracter, a comparator, and a gate are provided in the request prediction unit. In the request queue are queued request signals actually issued from a device connected to the first bus. The subtracter estimates the request signal that was thought to have been issued in the past by subtracting the cache block size from the address signal in the request signal each time a request signal is issued from a device connected to the first bus. The comparator compares the estimated request signal and the queued request signal. The gate permits the issuance of the prediction signal when the comparison detects a match, and inhibits the issuance when a match is not detected. In this manner, a limitation is imposed on the issuance of request signals onto the second bus on the basis of the transaction history of issued request signals from devices connected to the first bus so that not only does the prediction accuracy increase but the load on the second bus also reduces. Furthermore, if the request queue is provided so as to correspond with each device connected to the first bus and the request queue for queuing is selected on the basis of the source signal included in the request signal, a request signal from a device that does not frequently issue request signals can be prevented from being displaced from the request queue by a request signal from a device that does frequently issue request signals, and prediction can be performed at a relatively high accuracy for any device connected to the first bus. Furthermore, a source discriminator preferably provided in the request prediction unit excludes request signals of a signal type that is not data read from queuing to the request queue, a relatively high prediction accuracy using a relatively shallow queue for the request queue can be realized.
Clearly, from the preceding description, the mode of prediction in the present invention takes the form of a transaction history usage type or a transaction history non-usage type. In a preferred embodiment according to the present invention, the request prediction unit includes a transaction history usage request prediction unit, a transaction history non-usage request prediction unit, and a prediction logic selector. The transaction history usage request prediction unit predicts the contents of a request signal to be issued in the future from a device connected to the first bus on the basis of the contents of a plurality of request signals issued heretofore by the device. On the other hand, the transaction history non-usage request prediction unit predicts the contents of the request signal to be issued in the future by the device connected to the first bus on the basis of the contents of one request signal recently issued by the device. The prediction logic selector selects a predicted result from either the transaction history usage request prediction unit or the transaction history non-usage request prediction unit, and issues a prediction signal based on the selected predicted result. This sort of configuration enhances the flexibility of the system.
The prediction logic selector selects, for example, the predicted result by the transaction history non-usage request prediction unit when the load on the second bus is lower than a first criterion, selects the predicted result by the transaction history usage request prediction unit when the load on the second bus is higher than the first criterion and lower than a second criterion, and inhibits the issuance of the prediction signal when the load on the second bus is higher than the second criterion. This achieves both advantages of limiting the load on the second bus and increasing the prediction accuracy. In another example, the prediction logic selector selects the predicted result through the transaction history non-usage request prediction unit when the load on the first bus is lower than the first criterion, selects the predicted result by the transaction history usage request prediction unit when the load on the first bus is higher than the first criterion and lower than the second criterion, and inhibits the issuance of the prediction signal when the load on the first bus is higher than the second criterion. This achieves both advantages of limiting the load on the first bus and increasing the prediction accuracy. In yet another example, the prediction logic selector is provided with a load monitor for informing the prediction logic selector of the result of comparing the load on the second bus with the first and second criteria and a load monitor for informing the prediction logic selector of the result of comparing the load on the first bus with the first and second criteria. This enables the above-mentioned advantages to be achieved with a relatively simple circuit configuration.