In many multiprocessing systems, processors operate independently on various programs in order to execute the task at hand. Such systems are required when high performance is necessary, and cannot be achieved by the use of a single processing unit. Many such systems have been developed over time and multiple solutions exist for standard interface busses. The bus is used to communicate between two separate systems, over an interface, which is jointly used by two or more processing units. As a matter of illustration, FIG. 1 shows such a system. Each CPU (100) is generally comprised of a processor, and a local memory, used by that processor for its operation. In addition, it may include other input and output (IO) devices, however, such IO devices are not relevant to the present invention. Each CPU (100) is connected to a standard interface bus (110), whereby data is transferred between the CPU's of the system
Interface bus (110) can be implemented as an available standard bus, or alternatively, as a proprietary bus. In the past, two common busses were used in personal computers (PC) and were also widely used in other computer systems. The two were known as the Industry Standard Architecture (ISA) and Extended ISA (EISA) busses. Other known standard busses are the Micro Channel and the Video Electronics Standard Association (VESA) busses. However, with the development of higher speed processor and peripheral devices, higher speed busses had to be developed, one of which is the Peripheral Component Interconnect (PCI) bus. In a PCI system, with the exception of certain refresh cycles, a write request has the highest priority, and therefore is handled earlier than any other request, including a read request. Therefore, a write is generally performed faster than a read. Moreover, a write operation is performed to a buffer, thereby releasing the CPU immediately to perform other operations. In contrast, a read operation does not release the CPU until the data is made available to the CPU. The time difference can become even more significant when a multiple layer PCI system is put in place. Even more important is the case where wire-speed operation is required for a SAN system, and using of read operations across the bus reduces the overall response time of the system.
Several patents disclose a variety of methods related to affecting the overall performance of PCI system, by attempting to address issue of the time imbalance between a read operation and a more time efficient write operation. Larson et al. disclose in U.S. Pat. No. 5,524,235 an arbiter circuit to control access to main memory through a PCI bus. The disclosure describes how, under certain conditions, the processor-to-memory write requests are delayed to allow other cycles to proceed. Wade et al. disclose in U.S. Pat. No. 5,613,075 a method by which a master on the bus can guarantee certain performance levels, including for read operations. This allows the system to predict the worst-case situation of providing access to read operations, and this level can be fixed according to an arbitrary threshold level.
U.S. Pat. Nos. 5,634,073 and 5,634,073 to Collins et al., describe a more complex system where a controller handles a multiple queue system between the processor and the CPU. The system is also capable of checking if a write operation already exists into the same address into which a read request is made. They also propose various ways of improving the prediction of the rules to be used to increase system efficiency.
U.S. Pat. No. 5,835,741 to Elkhoury et al., discloses a system that addresses the performance issues relating to a burst mode. The fast burst mode allows for efficient access by means of sequential accesses to sequential memory addresses.
U.S. Pat. No. 5,754,802 to Okazawa et al., suggests a method and apparatus for increasing data transfer efficiency, specifically for preventing a deadlock situation, of a read operation in a non-split transaction bus environment by substituting a write operation for the read operation. Basically this is done by substituting one of the write operations with a read operation to an IO device. The IO device then executes the write in the local environment.
A more complicated approach is described in U.S. Pat. No. 6,134,619, which however, requires specialized hardware for the indication of space availability in the queue, and a read operation on the PCI bus. This solution is tuned for the case of multiple processors using different operating systems. In U.S. Pat. No. 6,145,061 Garcia et al propose a scheme for a circular queue with head and tail pointers, and certain ways to access the queue which further allow dynamic allocation of the queue size.
Prior art does not address the need of multiple processors to access data over busses such as PCI, in a manner that (a) reduces significantly the overhead associated with the read cycles, and (b) allows a system, such as a SAN system to operate at wire speed.