1. Field of the Invention
This invention relates array processor systems and in particular to a new bus structure which interconnects elements of a control/memory section and elements of a processor section.
The processing speed of dataprocessors has been increasing at an astounding pace in recent years, and the demand for higher-speed processors is growing accordingly. Efforts to meet this demand have been concentrated not only on manufacturing higher-speed integrated circuits (ICs) but also on improving processor architectures. Increasing processing speed by improving individual processor architectures, however, seems to approach its own limits. More recently, the need for higher processing speed is being met by providing parallel processing systems. The performance of these systems depends upon the effectiveness of communication among processors. In many currently available systems, interprocessor communication relies upon complex interboard wiring through which many signals are conveyed among the processors. Such wiring tends to increase the cost of the processors. To cope with this problem, attempts have been made to construct an architecture (e.g. multi-chip module--MCM) that divides functions within a processor. These attempts, however, have not yet reached commercial usefulness.
The present applicant and others have proposed a processor architecture which incorporates the concept of MCM in their patent application, Ser. No. 07/857,626 filed on Mar. 26, 1992, under the title of "PARALLEL DIGITAL PROCESSING SYSTEM USING OPTICAL INTERCONNECTION." In this proposed architecture, the processor section is separated from the control/memory section. Effective interconnection among the processor units in this architecture is simplified because a group of processor units is arranged in an array on a single chip. A large number of control/memory chips, that are separated from processor chips, may be independently interconnected on a separate circuit board.
2. Description of the Prior Art
The interconnecting bus for a parallel processor system is inseparable from the parallel-processing mechanism. Parallel- processing mechanisms can be classified in terms of their constituent multiprocessors, as either a closely-connected mesh system or a sparsely-connected tree system. In addition, there is a reconfiguration system classification in which connections can be varied freely in accordance with problems to be solved. The inter-processor connecting system for structuring each of these parallel processing mechanisms may be classified as being either a direct connection system or an indirect connection system.
In the direct connection system, processors which are arranged in mesh or hierarchy are directly connected to each other. Communications among processors that are not directly connected to each other are relayed by intervening processors. In the indirect connection system, switches are closed for interconnection whenever processing elements are needed. Bus interconnection falls into this classification, and is of a non-hierarchical structure. To execute processing, however, a hierarchical method is often imposed on the bus communication function.
By this hierarchical method, the function of bus management is assigned to one of the processors, or to a special-purpose processor (arbiter), with the processing functions assigned to the remaining processors. The bus interconnection communications method includes not only multiprocessors, but also LANs using ring buses to implement parallel-processing mechanisms, such as torus-connected networks, and hypercube-connected processor networks.
When classifying the types of circuit interconnection, there are bus interconnection, multi-bus interconnection, multi-port memory interconnection and channel interconnection. In bus interconnection, multiple processing elements are connected to a single bus, and connection is effected between arbitrary processing elements through the arbitration of the bus arbiter. In multi-bus interconnection, multiple sets of connections between processing elements are formed simultaneously by a plurality of buses. In multi-port memory interconnection, multiple processor elements and multi-port memories as processing elements are connected to each other; each memory may be accessed by a plurality of processor elements. Channel interconnection involves connecting processor elements to other processor elements the same manner as with input/output units; the connected processor elements are used exclusively for their specific purposes. This connection, however, requires one channel to connect one processor element, and so precludes connections among a large number of processors.
The following is a description of an exemplary bus interconnection structure and an exemplary directly connected mesh structure, both of which are closely related to this invention. Desirable functions for the bus structure include not only a high-speed data transfer capability but also a high-speed arbitration function. A block transfer function and the ability to broadcast memory data are also desirable. While a single bus is disadvantageous in terms of fault tolerance, multiple buses tend to involve complex control mechanisms. For this reason, a single bus architecture is more often adopted. For example, 32-bit wide buses are frequently employed for both data and address buses to match the bit width of the ALU (arithmetic and logic unit) used by the processors. In some cases, 64-bit data buses are employed to improve data transfer capacity.
Synchronization mechanisms for data transfer operations are divided into synchronous bus and asynchronous bus systems. In a synchronous bus system, the bus controllers for all processors are operated in synchronism with the same clock. This system has advantages in that it uses relatively simple hardware, and is capable of high-speed transfer. In practice, however, high-speed transfer is difficult to implement with this system because of the occurrence of clock skew among the processors. Data transfer speed is lost when the synchronization of operations must compensate for the skewed clock.
In an asynchronous bus system, on the other hand, data are transferred using dedicated synchronization signal lines, i.e., strobe and acknowledge lines. In this system, the transmit side provides addresses and data to the bus, and then sends a message, on the strobe line, to the receive side. Upon receipt of the data, the receive side sends a message, via the acknowledge line, acknowledging the receipt, to the transmit side. Upon receipt of the acknowledge message, the transmit side returns to its initial state, as recognized by the receive side, and also returns the acknowledge line to its initial state. This system is called the two-wire handshake. To transfer data to multiple receive sides, three-wire handshaking is employed by using another acknowledge wire. While the asynchronous system eliminates the need for a common clock as with the synchronous system, its transfer rate is slowed due to the handshaking.
The arbitration mechanism is divided into a centralized system in which processors are controlled centrally at one location, and a distributed system in which control functions are distributed among processors. In a centralized system, each processor presents a request to the arbiter and receives a reply from the arbiter. Consequently, this system requires as many request lines as there are processors. In an alternative system of this type, a single bus is used and the time which each processor may use is determined in advance by time-division multiplexing the bus itself. Centralized systems are suitable only for relatively small numbers of processors because their packaging design becomes complex with increasing numbers of processors.
One exemplary distributed system cascade connects the processors in a daisy-chain configuration. The right to use the bus is checked in the priority order of processors. Once the processor having the highest priority obtains the right to use the bus, that fact is conveyed to the lower-priority processors. While the daisy-chain connection system is simple in construction, it requires decision-making time proportional to the number of processors.
Another exemplary method is that used, for example, by Futurebus. This method assigns each processor which needs to access the bus its own priority/identification number, expressed in binary digits. These priority numbers are applied to an open collector bus when a logic-low level is applied to any one of the open collectors, the corresponding bit line also becomes a logic-low level. An open collector bus arbitration system selects the bus requestor which has a bit width required to express all of the processor numbers in binary digits, and a which has the same identification/arbitration number as is expressed in the multiple lines which the bus arbitration system uses to decide the right to access the bus.
In this circuit, winners are decided in order of higher identification/arbitration numbers, and losers become unable to output logic-low levels to the lower-level buses. In Futurebus, therefore, a remedy for lower-level processors is adopted by implementing the decision-making process as a high-speed operation. In this system, each arbitration operation occurs in an amount of time equal to the total delay time of the buses. This method is being employed in systems other than Futurebus, for example Multibus-II (Intel) and Nubus (Texas Instruments). Advantages of Futurebus compared with the conventional buses are as follows.
1) It is a perfect asynchronous bus. PA1 2) Addresses and data are transferred on a time-multiplex basis on 32-bit lines. PA1 3) All lines are open collectors, on which arbitration is implemented. PA1 4) Emphasis is placed on the block transfer and broadcasting of data. PA1 5) Consideration is given to fault tolerance. PA1 1) The number of connecting circuits is desirably small. PA1 2) The number of relays is desirably small. PA1 3) The number of communication paths which may be used simultaneously is desirably large. PA1 4) Connecting circuits should not intersect with each other wherever practicable.
In this category of the direct connection structures, there are a number of interconnection systems which have been designed to meet specific applications. Depending on the layout of processors, there are a wide variety of connection methods, such as linear interconnection, ring interconnection, grid interconnection. As a variation of the grid interconnection, there is torus interconnection. In addition, there is a tree interconnection, suitable for hierarchical structure, and a hypercube interconnection, which takes advantage of communications lines having a uniform length. When processors are placed on nodes of these circuit networks, their branches become connecting circuits. The performance of a circuit network is determined by how many communication paths connecting two processors can be used simultaneously and by the number of relays that are needed on a particular communication path. A communications path which requires a large number of relays is much slower than one which uses a smaller number of relays. The number of relays may be reduced by increasing the number of circuit networks. Increasing the number of networks, however, has its limitation in terms of cost. General requirements for circuit-connected networks are as follows:
Some of these requirements conflict with other ones of requirements, and there are no connecting networks that satisfy all these requirements.
Connecting circuits include, for example, buses, channels, communication memories, FIFO buffers and switching circuits. These connecting circuits are further divided into dedicated connecting circuits used by only two processor elements, and shared connecting circuits used by three or more processor elements. Besides these, there is a system where a dedicated circuit is used in common by processor elements which may change the circuit by switches. The data transmitted to connecting circuits are normally bit-parallel data. In some cases, however, bit-serial data may be transmitted.
Dedicated connecting circuits are generally used for normal direct connection networks. These circuits are typically channels, communication memories and/or FIFO buffers. Shared connecting circuits include buses and multi-port memories. By alternately arranging 4-port memories and processor elements on a plane, a torus-connected network can be formed. Furthermore, a mesh-connected network having a two-dimensional configuration of m.sup.2 units of processor elements can be implemented by using 2 m bus pieces. An octuple grid connecting network can also be formed.
In synchronous communications, data are transferred by synchronizing the transmit side with the receive side. As a result, one pair of processor/memory elements has to wait so long as the other pair is busy. This waiting time is called the synchronization time during which processors are kept idle. This synchronization time tends to reduce, the efficiency of the parallel processor system. If synchronous communication occurs in a network having a relatively large number of relays, synchronization time must be provided at every relay, leading to increased communication time. One-of the advantages of the synchronous system, however,.is that blocks of data may be transferred irrespective of their size. Thus, the amount of data to be transferred at any one time is not limited by the size of buffers.
The asynchronous system, on the other hand, has buffers of a size sufficient for store data once. The transmit side writes data into the buffer and the receive side reads data from the buffer. Consequently, data transaction can be executed even when one of the processors is busy. In terms of data transfer rate, since the buffers are accessed twice in the asynchronous system, the asynchronous system takes more time, compared with the synchronous system. Dual-port memories may be used in a time-sharing mode as the buffers. Alternatively, simple FIFO buffers may be used in the same way. To implement two-way communication, separate buffers are provided for data transmission and reception.
Even when buses or channels are used, asynchronous communication can be facilitated by providing buffers for processor elements on both the transmit and receive sides, and transferring data using processor interrupts. This system has slightly lower efficiency compared with a connecting circuit that has buffers because, to be effective, processors desirably spend much of their time managing the buffers. Nonetheless, in some applications, asynchronous systems can achieve more efficient communications than synchronous systems.
Since a mesh connection system generally involves a large number of connecting circuits, it is generally used only in large-scale multi-processor systems. This poses a problem in terms of cost. For this reason, this type of system may use a series transmission circuit which has fewer transmission lines but which sacrifices communication speed.
It is an object of this invention to provide a bus structure for multiprocessor systems of a type in which the processor section and the control/memory section are separated from each other.
It is another object of this invention to provide more efficient bus structure suitable for image processing.