This invention relates generally to an integrated circuit (IC) interface design and, more particularly, to a system and method of using a crossbar switching network to provide access to a plurality of selectable internal IC nodes from a smaller plurality of external interface pins.
Systems On Chip (SOC) design faces conflicting requirements. On the one hand, increasing the number of on-chip peripherals (SRAM, Cache, Serial & parallel I/O, DMA, etc) for maximum flexibility. On the other hand, reducing cost by limiting the number of external I/O and reducing package size. It is also desirable to provide visibility of some of the internal signals for silicon debugging.
The term SOC, as used herein refers to an IC which consists of a processor, embedded memory, various peripherals, and an external bus interface. FIG. 1 illustrates an example of a System On Chip from Sharp based on the ARM7 Thumb.TM. Core (prior art).
The processor in a SOC can be a CISC (Complex Instruction Set Computing) CPU such as x86 or 68k, or a RISC (Reduced Instruction Set Computing) CPU such as ARM.TM.. The processor can also be a general purpose DSP (Digital Signal Processor) such as TI's DSP, a specialized DSP such as Sharp's Butterfly DSP.TM., or a combination of a CPU and a DSP.
Embedded memory can be either volatile (SRAM, DRAM) or non-volatile (ROM, Flash). Peripherals vary from the general purpose (Counter/Timers, UART, Parallel I/O, Interrupt controller, etc) to the specialized (LCD Controller, Graphics Controller, Network Controllers, etc). The external bus interface allows the SOC to interface with external memory devices and peripherals with little or no glue logic. The interface varies from a simple SRAM interface to a fully programmable universal interface.
In previous designs, an electronic system would be based on a board populated with a microprocessor or microcontroller, memory, discrete peripherals, and a bus controller. Today, such a system can fit on a single chip, hence the term System On Chip. Almost every semiconductor company that has a processor, or access to one, is developing System On Chip products. This advancement in technology allows system designers to reduce system testing and size, improve reliability, and shorten the time to market for their products.
Modern system design require fasts speeds and high integration at a low cost along with short time to market from SOC vendors. These requirements are contradictory in nature. A faster CPU requires a smaller process technology, (0.35.mu. or 0.25.mu.) which tends to cost more than an already established older process. Higher integration produces a larger die area, increases I/O pins, and requires a larger package size. This leads to higher die cost, higher pin test cost, and a more expensive package. The challenge is to achieve high integration yet lower the cost to be competitive in the market place.
System designers can reduce the cost of SOC in several ways such as using an older process, using less expensive packaging technology, reducing the number of I/O pins, or repositioning the I/O pads.
Using an older technology such as 0.65 .mu. to fabricate the SOC will reduce its cost. Older technologies are mature and their wafer cost is significantly lower than a newer technology such as 0.35.mu.. However, the older technology produces SOCs with large die size and low die count per wafer. In addition to that, the processor speed will be slow and, depending on the application, the SOC may not be competitive. The competition is always striving to use the latest in process technology.
Choosing a mature package technology such as TQFP (Thin Quad Flat Package) or QFP (Quad Flat Package) minimizes the cost of the package. More advanced packaging technologies, such as CSP (Chip Scale Packaging), tend to be higher in cost. However, CSP offers a smaller size, lighter weight, and faster speeds for those applications that demand such requirements. For example, handheld devices such as cell phones or PDAs (Personal Digital Assistant) which are area, and weight constrained, can benefit from such advanced packaging technologies.
Reducing the number of I/O pins on a SOC reduces package cost, and die size. Mature packages (e.g. QFP/TQFP) tend to have a $0.01-$0.015 cost associated with each I/O pin. Newer packages (e.g. CSP) tend to have a slightly higher cost associated with each I/O pin. As for the die size, each unique I/O pin requires a unique bond pad. FIG. 2 illustrates bond pads located along the sides of the die, forming a reduced die area (prior art). This minimum die area determines the minimum die cost. An IC is said to be "Pad Limited" when the actual die size is less than the minimum die area defined by the I/O pads. In this case, reducing the die size will not reduce its cost. However, reducing the number of I/O pins or staggering their pads will reduce the die area and thus die cost. As process technology gets more advanced (0.35.mu..fwdarw.0.25.mu..fwdarw.0.18 .mu.. . . ), SOC devices will be highly integrated and will tend to become "Pad Limited". This means that the number of I/O pins on a SOC will be a critical factor in determining the die size and SOC cost.
"Die Limited" IC is an IC with a die size that is greater than the minimum die area and the I/O pads have to be spread apart to make room for the die. In this case, reducing the die size will reduce its cost but will also sacrifice high integration.
Typically, bond pads are aligned along the sides of a package as shown in FIG. 2. FIG. 3 illustrates a staggered pad layout to reduce the die area (prior art). Staggering the pads results in reduced die area while maintaining all the I/O needed by the SOC. However, staggering pads introduce design and assembly challenges. On the design side, more I/Os will introduce noise that will require adding more power pins. Staggering pads significantly reduces the die area thus limiting the number of functions that can be integrated on the SOC. Staggered pads are generally used with very small die designs. On the assembly side, staggered pads can require special lead frame and fine pitch bonding machines, adding to assembly cost and time.
Reducing I/O pins on a SOC to reduce die size and cost requires multiplexing. For example, if a SOC requires 180 functional I/Os but the package offers only 140 physical I/Os (excluding power pins), the remaining 40 (180-140) functional I/Os have to be multiplexed. Assuming that out of the 140 available I/O, 120 are dedicated I/O and can't be multiplexed due to functionality or timing reasons (e.g. address bus, data bus . . . ), then 60 (180-120) functional I/Os and 20 (140-120) physical I/O remain. That is, each remaining physical I/O pin has three (60.div.20) functional I/Os associated with it. Table 1 summarizes this example.
TABLE 1 I/O Multiplexing Example Function I/O Required 180 Physical I/O Available 140 Physical I/O Dedicated 120 Physical I/O Multiplexed 20 Functional I/Os per 3 Multiplexed pin
Traditionally, I/O multiplexing takes the form of assigning each physical pin a fixed number of functional I/Os. In the previous example, one of the non-dedicated I/O pins multiplexes three functional I/Os: F1, F2, and F3. The system designer is forced to select among functions F1, F2, and F3, unless these functions are also repeated on other non-dedicated I/O pins.
A more flexible solution is to allow each of the functional I/Os to map to every physical I/O pin. In the example provided, each of the 20 physical I/O pins will have all 60 functional pins mapped to it. This will give the system designer total flexibility to customize the systems' I/O according to the target application. It will also give the SOC designer the visibility of internal signals for debugging purposes. In the past, this flexibility has come with a price. The mapping logic is gate intensive, resulting in added delays and loading. The mapping logic also requires extra testing.
It would be advantageous if, in an SOC device, a physical I/O pin from an IC device could be assigned to a large number of functional I/Os. Further, it would be advantageous if many physical I/O pins, each had the capability of being assigned to the large number of functional I/Os.
It would be advantageous if a crossbar switch could be developed to interface between functional I/Os and physical I/Os in an IC with a minimum number of gates and stages of switching so that the time delay across the switch is minimized.
It would be advantageous if an IC crossbar switch could give a digital systems designer greater flexibility, with simplified design mapping, minimal the added delays and loading, and allow visibility of internal signals.
Accordingly, a System On Chip (SOC) crossbar switching network with a small time delay has been provided. The crossbar switching network comprises N input nodes, or functional I/Os, and N output nodes, or physical I/Os. In one aspect of the invention, N=64. (n) layers of N switches multiplex signals between the input and output nodes. In one aspect of the invention, n=3. Each switch has 2i signal inputs operatively connected to the input nodes. Each switch multiplexes the input signals to provide an output signal at a signal output. Further, each said switch has i control inputs to select which input signal is output by the switch. In one aspect of the invention, i=2. In the minimal stage concept of the present invention, N=2.sup.(n+i+1).
Switch networks can be added to permit bi-direction signal flow, from input nodes to output nodes, or from output nodes to input nodes. In this manner, signals at the output nodes are made operatively connectable to any input node.
In some aspects of the invention, there are only 40 physical I/O pins to interface to the 64 functional I/Os. Then, the switching is slightly simplified. Two layers of 64 switches, and 1 layer of 40 switches are needed to multiplex signals between the physical and functional I/Os.
A method for crossbar networking input signals from N input nodes to N output nodes in n stages of decision making, where N=2.sup.(n+i+1), is also provided. For example, when N=64, n=3, and i=2, the method comprising the steps of:
a) combining the 64 input signals into 16 vectors of 4 bits; PA1 b) replicating the vectors of Step a) a total of 4 times, to generate a total of 64 vectors; PA1 c) selecting one signal from each vector to provide an input signal to the next stage; PA1 d) cycling through Steps a)-c) a total of 3 times, whereby each output node is selectively connectable to each of the 64 input nodes through 3 steps of decision making. At the end of the last cycle, each of the 64 vectors is programmable to provide any one of the 64 input signals.
The method allows input signals to be interfaced from the output connector pins to the internal functional I/O nodes. The method generally follows the steps described above. The method also provides for crossbar switching between N inputs and M output, where N&gt;M. Further, the method provides for bi-direction switching between the N and M set of nodes.