1. Field of the Invention
The invention relates to bused digital intercommunication and interconnect, and more particularly to high performance interconnection of very large scale integrated circuit (VLSIC) logics.
2. Description of the Prior Art
The description of the prior art in the following sections presents an overview of the VLSIC interconnect problem, the parameters in consideration of which a VLSIC interconnect scheme should be configured, and some considerations which indicate that a standard electrical protocol is desirable for VLSIC interconnect while a number of communication protocols are required to efficiently service many types of VLSIC networks. The underlying situation in the prior art is that no digital bused intercommunication scheme or apparatus--from whatsoever area of the digital components or digital device or digital system prior art connects derived--is adequately versatile in satisfaction of the VLSIC interconnect problem. Versatility simply means that a single apparatus building block in VLSIC should efficiently and effectively service a broad spectrum of interconnect requirements: from few to many interconnected devices, from few to many bidders for bus access contending individually and collectively at low rates or high rates, from pin limited interconnects to bandwidth limited interconnects, from few to many addressable slave devices and/or slave commandable functions, from high data rates supported by wide words on many pins to severely pin limited data interfaces operating at lower data rates.
Although prior art attempting generalized satisfaction of the VLSIC interconnect problem, particularly in the versatility required, is not known to the inventors of the present invention, there exist certain individual prior art designs pertinent to various aspects of the present invention. A discussion of these prior art areas is amplified in this section so that the divergence, as well as the varying scope, of the present invention from prior art techniques may later be recognized.
2.1 Bus Topology and Performance
Very large scale integrated circuit (VLSIC) interconnect requirements span a wide range of topology and performance. Information must in some cases be transferred almost instantaneously between two chips with little if any preparation time for the transfer. In other cases a complex of information must be sent to any of dozens of possible recipients. The pins used to transfer the information on and off chips are always at a premium.
It is impossible to meet these requirements in a satisfactory way with any single definition of an information transfer path. Yet the use of several different definitions between different VLSIC chips may reduce the number of ways that available chips can be interconnected and will, therefore, reduce the usefulness of the chips.
Compared to the two or three printed circuit (PC) cards that it replaces, a VLSIC device will run at ten to one hundred times the speed, and consume 1/100 to 1/10,000th of the power. In order to derive these advantages, the costs of development must be absorbed. Here, too, there is a major difference between medium scale integration (MSI) and VLSI. The VLSI circuit device may involve development costs of up to ten times what implementation of the same function would cost on PC cards. In order to reduce the impact of high VLSI device development costs, both the supplier and the customer would like to be able to amortize these costs over several production units. The production volume required to do this might be greater than one program can guarantee, so that the device design would have to anticipate requirements which might be initially undefined. Interfaces are the chief impediment to this objective. In the environment where PC card design is relatively cheap, each designer has a tendency to optimize the interfaces between cards to best suit his immediate objectives. If this latitude were permitted with respect to VLSI devices, it would defeat the objective of minimizing development costs by creating new development costs in satisfaction of interface requirements. Furthermore, it would add still another part number to inventory, with those associated costs. Yet standardization on one interface or another always carries with it a penalty, the penalty of mismatch between the standard and the actual requirement. This is the familiar dilemma of optimization-versus-development cost.
Prior art digital interconnect bus topologies have generally traded flexibility in the information transfer path for data transfer performance. Data processors implemented in VLSIC typically require 10.sup.6 data transfers (words) per second, preprocessors may require 10.sup.7 data transfers per second, and signal processors may require 10.sup.8 data transfers per second. Bus interconnect at these performance levels is supported by crosspoint switches interconnecting two to eight devices (Users) and configurable gate arrays (CGA) usually interconnecting two devices. Specification of the topology of a crosspoint switch or CGA is rigid; one apparatus will support one operational interface (communication protocol) in the interconnect of a network of rigid form (i.e., shared or point-to-point interconnect paths, hub of a wheel or daisy chain linkage). To therein obtain desired data transfer performance, communication to greater than a set number of network interconnects will be sharply proscribed. Conversely, digital interconnect buses offering flexibility in interconnect topology (such as types interconnecting functional sections of computer systems) do not support data transfer performance at the rates desired for VLSIC on the limited pins available to such circuitry. The pin limitations are further discussed in section 2.3.
2.2 Bus Variability
The requirements placed on interconnect paths in a system vary so widely that different solutions to their implementation are essential to efficient pin use (pins are a precious resource in VLSIC technology). On the other hand, there are many examples of differences in interconnect in today's technologies that add nothing to chips' usefulness and instead merely add to the need for "glue" chips. For example, there are TTL chips that use a positive-going clock while others use a negative-going clock. If both kinds of chips are used in a system, both clocks must be supplied, using more pins and "glue", and obtaining no new usefulness.
It is the intent of the present invention to accommodate the variability needed to meet different requirements without permitting extraneous variations. In order to approach the problem, the essential reasons for variability must be examined.
2.2.1 Fundamental Interconnect Requirements
The reason for interconnect is to transfer information among two or more physically separated locations, whether the information is a single status bit or a complex communication packet. It should be self-evident that the information rate and the amount of elapsed time allowable for a given transfer will affect the interconnect design.
The other major kind of variability has to do with the number of separate locations possibly involved in a transfer of information. Two devices may be directly interconnected for an information transfer facility. This facility might be used unidirectionally, with information always moving in the same direction. Or it might be bidirectional, with information moving in different directions at different times. These two cases have different requirements for coordinating the use of the transfer facility, and should be expected to require different capabilities in the transfer facility. These are three different cases: Transfer from Location 1 to 2 only, from 2 to 1 only, and bidirectional.
Imagine both the daisy chain (1 connects 2 connects 3 connects 1) and hub of a wheel (1 connects to 2 and 3, 2 connects to 1 and 3, 3 connects to 1 and 2) cases wherein three locations are interconnected. If they are connected as in a daisy chain, then each separate interconnect has the above three possibilities for a total of nine possibilities. If the three locations use a common transfer facility, as in the hub of a wheel interconnection, then the number of possibilities is no different. The nine possibilities must, however, be coordinated within the common transfer facility to avoid interference. As more devices are interconnected, the number of variations increases rapidly, creating significant coordination problems requiring a large amount of information transfer to solve. The complexity of the interconnect is strongly affected by the number of locations that are to be interconnected and by the directions of their information transfers. An efficient interconnect system cannot saddle simple interconnects with the coordination overhead required for the complex ones.
2.2.2 Parameters of Variation
Ideally, an optimum interconnect would be rigorously determined from the system requirements for the interconnect. This section discusses the parameters that might be used as input to such a rigorous determination.
A first parameter is the transfer rate. The rates at which information is transferred will be an important requirement. Normally, the peak rate is used, though an average rate can sometimes be useful if sufficient buffering is provided.
A second parameter is transfer latency. Latency is the amount of time that is permitted to elapse from the initial decision to send information until it has been completely sent. Because of pipelining and time overlapping methods, latency is somewhat independent of transfer rate and is therefore, separately specified.
A third parameter is the number of interconnected locations. As discussed above, the number of locations strongly affects complexity; therefore, an efficient interconnect must take it into account. It is useful to subdivide the locations into those that independently choose to use the interconnect, and those that only respond to information on the interconnect. The latter will be called "slaves" as they are subordinate to the master's selection of transaction and timing. The former are called "masters," and are also called "owners" when they control the interconnect. The number of locations that are masters has the strongest influence on interconnect complexity. If a single location can, at various times, serve as both a master and a slave it is called a master-slave.
2.2.3 Prior Art Standards of Interconnect
The creative designer sometimes finds himself restricted by some standard that stands in the way of his optimum design. It is in fact true that standards often provide for things that a particular design may not need, and in that sense force a non-optimum design. But well chosen standards provide a tremendous return for that inefficiency; they make it possible to design and build subsystems that can be stocked, and then latter combined into systems of greater complexity and specialization.
To illustrate, consider the transistor-transistor logic (TTL) families. Each TTL integrated circuit is designed so that its input and output pins obey certain voltage, current and capacitance standards. Then the devices are manufactured in volume for project designers who use logic design rules that incorporate fan-in, fan-out and delay times rather than voltage, current and capcitance. These rules are easier to use in digital design than are electrical rules, and the designer's task is therefore easier. The inefficiencies introduced into IC designs, as evidenced by the extra transistors, enabled the explosive growth of digital logic in today's systems.
Subfamilies have also growh within the major digital families. For example, there are edge triggered devices, with further subdivision into positive edge triggered and negative edge triggered. Bus drivers, receivers, transceivers, etc., have been developed to help interconnect TTL systems. These kinds of chips make it even easier to build the systems because they begin to form subfamilies that provide not only electrical standardization, but also electrical protocol standardization (e.g., positive edge triggering). More recent chips have also shown some concern for arrangement for pin assignments to simplify routing on the PC board. These improvements increase the applicability of digital logic.
A higher level of digital logic families have also evolved. These families are constructed on PC cards and use standard bus systems to interconnect them. Just as for TTL, the bus standards provide for easier use of the cards at a cost of extra logic on each card. The cards can be constructed ahead of time and stocked on the shelf. Examples of such families are the Sperry Univac.RTM. RMF bus, Intel's Multi-bus, and the S-100 bus.
2.3 Requirements for a VSLIC Standard Interconnect and Pinout Problem
A major emphasis in VLSIC developmental program is to achieve on-chip speed and density goals, that is, VLSIC development programs are semiconductor technology programs. But it is empty achievement to produce 100,000 unreachable gates, and there is also concern that useful chips be produced.
A family of VLSIC chip types (in the sense of family discussed above) would be a potent set of building blocks for future electronic systems. As in the TTL families, much versatility stems from the ability to interconnect the basic building blocks in many different ways. While some applications will always require gate array or custom implementation, many will be well served by off-the-shelf VLSIC chips.
But VLSIC chips have gate complexities comparable to cards in the bus families mentioned above. It would be just as impractical to require an interface device between VLSIC chips as it would be to require interface cards between each card of a bus interconnected family of cards. There are other analogies between VLSIC and bus interconnected families. The number of pins available is similar, and the functionality of VLSIC chips is often similar to that of cards today.
But there are important differences too. VLSIC technology promises much higher performance than that of cards. But it cannot currently provide for as much memory as can be placed on a card, and the development cost for a VLSIC chip is perhaps ten times higher than for a card.
These differences accentuate the need for a standard interconnect, but at the same time prohibit use of the present bus standards. The standards are too slow, for example, to connect a high speed processor and its memory while they use too many pins to allow more than one bus to connect to a chip.
Let us characterize the VLSIC interconnect requirements. The technology is projected to drive signals from chip to chip in 20 to 40 nanoseconds, with internal gate delays of 1 to 2 nanoseconds. Up to 120 signal pins is currently considered practical on each chip. Because the interchip time (rooted in the finite speed of electromagnetic propagation) is long compared to gate delay, it is an important performance limitation, and an acceptable interconnect had better not increase it.
Another problem that emerges in VLSI is also related to interfaces--the pinout problem. The pinout problem has to do with the way in which the semiconductor die is mounted in a package. Wires emanate from the four edges of a die to pads on the edge of a carrier cavity or mounting surface. These wires must be far enough apart to prevent shorting during vibration or thermal expansion. The number of wires is therefore limited by the perimeter of the die. One hundred seventy-five microns center-to-center is the current practical limit for pad spacing on the die. Therefore, a chip with edge dimensions of 250 mils can support no more than 136 wires (corners cannot be used). Chips could be made larger for no reason other than to increase their periphery, but a 400-mil chip, the largest contemplated in the next five years, will still only support 220 pins. Studies reported in the literature, based on observed data for PC cards, have been used to develop an empirical rule for the gate-to-pin relationship. This formula, known as Rent's rule indicates that for the various chip sizes and gate counts projected for the immediate future, there would be a shortage of pins if the same unrestricted use of interface pins employed in PC cards were to be allowed. This projection is shown in FIG. 2. With Rent's rule, even the most optimistic estimate for a 15,000-gate module is 306 pins. The die for 15,000 gates in 1.25 micron feature size Complementary Metal Oxide Semiconductor (CMOS) geometry has to be only 260 mils square, big enough for 140 pins only. Therefore there is a severe, 166 pin, deficit in the number of interface pins which would conservatively be required to connected to the logics within such a module.
2.4 Interconnect Efficiency
Any realizable interconnect imposes finite limitations on the amount of information that can be transferred in a given amount of time. A useful way of judging the efficiency of an interconnect is to compare its transfer capability with that of the underlying physical limitations, expressed, for example, in baud, bandwidth, or bits per second. Interconnect efficiency provides some idea of the cost of using a particular interconnect over a theoretically ideal one. For example, if the operational bandwidth of each line utilized in a bussed digital interface were 25 MHz, and such interface had the net effective capability of transferring 25 million data words per second, then the efficiency of such interface would be 100%. Unless control sequences and activities (if any) are time overlapped (i.e., pipelined) with data transfer sequences, an efficiency of 100% is impossible.
2.5 Prior Art Error Detection and Correction
The problem of avoiding system errors in the face of individual failed interconnect lines can be addressed with single error correct, double error detect (SEC/DED) Hamming codes if the data being sent on a set of lines all originates at one place. Under these conditions an appropriate check digit can be calculated, transmitted and decoded along with the data so that any single error, including one occurring in the check digit itself, can be corrected by the receiving chip(s). It is asserted that SEC/DED codes are practical that are compatible with the variable pin count characteristics of the present invention. Such accommodation of SEC/DED to variable word widths is neither taught in the literature of the prior art nor is it taught within this disclosure because a superior, alternative, method will be taught instead.
If conventional SEC/DED were to be employed for the bus apparatus of the present invention, then the number of pins needed to transmit a check digit for 2.sup.n bits is n+2, and the amount of time needed to check for and correct errors at the receiver is on the order of one clock cycle. In practice, about 10 pins would be needed for check digits that would cover 75% of the intercommunication lines of the present invention. This is because data originating at multiple locations is by definition not available at any single point for generation of a check digit.
A more effective SEC/DED system has been invented instead for the bused interconnection purposes of the present disclosed apparatus. It is considered more effective because it eliminates the error checking delay when no error actually occurs, it provides 100% pin coverage, and requires only two extra pins. Correspondingly, this specification does not teach methods and apparatus of prior art error correction codes, including any adaptation of prior art SEC/DED to the variable word widths (i.e., variable pin count) of the present invention.
2.6 Prior Art VLSI Wired-OR Interconnection
The current invention will, through a special two-time phase electrical communication protocol plus the synergistic utilization of all interconnected drivers in combination to charge the interconnecting bus lines, take a communication method previously found only on VLSIC chip internal buses and expand such method and modify such method to allow high performance interchip communication--an area currently dominated by tri-state drivers. The discussion in the book "Introduction to VLSI Systems" .COPYRGT. 1980 by Mead and Conway and published by Addison-Wesley is especially pertinent to an understanding of prior art VLSIC interconnect.
This prior art reference to both VLSIC chip internal and external buses is summarized in related U.S. patent application Ser. No. 355,803, the contents of which are expressly incorporated herein by reference.
For the purposes of the apparatus and communication method of this application it should be summarily noted that the prior art drive current and resultant size problem for prior art VLSIC tri-state pad (bus) driver stage output transistors is very severe. If 37 pads were to be driven on each VLSIC chip and a one meter bus interconnecting 20 chips supported, the equivalent current transistors might be as large as 1.25 microns.times.800 microns for the N-type transistor and 1.25 microns.times.2400 microns for the P-type transistor utilized within a tri-state driver. For a reasonable size VLSIC substrate this means that one-third of the available area is devoted to interconnecting lands, one-third to logics, and one-third to the two drive and one receive per bus line interface transistors. The apparatus and method of the present application will later be seen to be much more economical in the interface transistor size required. Moreover, for the purposes of the invention of this application it should be noted that all the capabilities of a prior art tri-state pad (bus) driver--the ability to charge, or drive high, a connected bus line; the ability to discharge, or drive low, a connected bus line; and the ability to do naught save present high impedance to a connected bus line--are alternatively achieved by the apparatus and method of the present disclosure. In particular, these capabilities mean that communication may be wired-OR between interconnected devices if the High condition of each bus line is defined as a logical "0". Wired-OR means simply that the logical OR functon is enabled on bus communication lines when any interconnected device may drive a bus line Low or a logical "1", regardless of which state of three any other device is assuming. Such wired-OR communication, in an alternative manner to the tri-state drivers of the prior art, is continued to be implemented by the apparatus and method of the present disclosure. Certain inventions of this application, such as the communication activity of distributed arbitration, require this wired-OR capability. Therefore related U.S. patent application Ser. No. 355,803 may be perceived of as teaching, amongst other things, how to do efficiently a certain wired-OR communication function which is integral to the invention of this application.