A conventional VLSI central processing unit (CPU) consists of four functional parts, namely the instruction unit, the arithmetic logic unit, a file of registers, and the bus (or buses) connecting the CPU to its external memory devices. Normal operation consists of fetching instructions from the memory device and supplying them to the instruction unit via a bus. The arithmetic logic unit will then read data from, and write new data back to, the registers and memory devices via a bus in response to control signals form the instruction unit. The register file usually contains a number of general purpose registers (address and data) and special purpose registers such as stack pointers and status registers etc. However one register which is common to all conventional central processing units is the program counter which is responsible for identifying the address of the next instruction.
Over recent years the processing speed of central processing units has increased dramatically and it is now accepted that, when implemented using similar techniques, the overall speed of operation of a data processing system is determined not by the speed at which the CPU can execute instructions but by the speed at which data may be transferred between the CPU and its associated memory devices. This problem is then exacerbated in machines designed to run large programs in which large (but slow) memory devices are selected in preference to fast (but smaller) devices which would improve system performance.
Techniques are known for improving the operating speed of a system but, in addition to increasing the price, these also result in creating new constraints. Furthermore these techniques do not resolve speed problems related to the bus, which also affects performance particularly when many separate memory devices make up the total addressable space.
It is therefore an object of a first aspect of the present invention to provide an improved data processing apparatus of the type in which processing is distributed between two or more processing units.
In a data processing system of the type having distributed processors a communication system must be provided between processors and remote memory delves. In a conventional processing system the addition of further processing units will provide a diminishing return in overall processing power as the buses become fully loaded; it becoming necessary to increase the available bandwidth. Problems of this type occur in many environments where a number of devices must communicate with each other and it is not possible to provide individual communication links between each device and every other device. Known solutions may be classified into four types, namely: bus, ring, crosspoint and store-and-forward, with of course, many hybrid systems. It is therefore an object of a second aspect of the present invention to provide an improved communication system.