1. Field of the Invention
The present invention relates to neural networks and VLSI technology. More particularly, the present invention relates to apparatus and methods for inter-chip communication of large numbers of events between multiple senders and receivers in silicon neural networks and the like over a limited I/O structure.
2. The Prior Art
Although the nervous system can perform many specialized tasks, at a gross level its primary function is to gather sensory data and to translate them into effective action. Animals learn from experience so that their responses become more appropriate. We hope to capture the essential nature of biological nervous systems by evolving our artificial system in a real-time sensorimotor context. Real-time sensorimotor processing as complex as that performed by the common house fly is unattainable even with today's fastest digital computers. The computational ability of the fly is incomparable to that of the digital computer because the principles of digital computation are fundamentally unlike those used in the nervous system. The computer reduces the information on its wires to 1 bit and combines the information in a sparsely connected array of logic gates. In contrast, neurons communicate in analog values and are richly interconnected.
CMOS VLSI is an analog electronic computational medium that has many properties in common with nervous tissue, and has the potential to achieve real-time sensorimotor processing. Although we are a long way from realizing an autonomous artificial neural system in this medium, we have made progress in key areas. Fast, high-density sensory processing and simple sensorimotor feedback systems now exist in CMOS. Analog computation allows these chips to perform complicated functions in real time. Some of these chips are able to modify themselves based on their past history. For example, a prior art adaptive retina adapts to a long time-average intensity to center itself in the correct operating range. Ideally, a system that incorporated real-time sensory and motor processing and on-chip learning will be able to learn directly from experience and optimize its own performance in a changing environment. In this application, one of the components of an analog neuron: the "axon", which is the means of communication between major functional units of a neural network, is disclosed.
Communication between neuronal elements is a principle limiting factor in the design of VLSI neuromorphic systems. This fact is not surprising considering that a large fraction of the nervous system is devoted to myelinated axons. The degree of convergence and divergence of single neurons is staggering in comparison with that in man-made computers. It might appear impossible, even in principle, to build such structures in VLSI circuits, which are limited to a virtually two-dimensional plane of silicon. Surprisingly, the cortices of the brain are nearly two dimensional as well. In fact, it has been shown that the degree of connectivity in a system whose wires occupy space cannot be markedly increased by employing a structure in which nodes are arrayed in three dimensions.
There is nothing fundamental about the structure of neural tissue that cannot be embedded in silicon. The thickness of cortical structures can be represented with a correspondingly larger silicon surface area. However, silicon surface area is available in small die, which are several millimeters on a side. The number of neurons that can be fabricated on a single die is therefore limited. Consequently, connections between silicon neurons located on different chips are essential for building even moderately sized artificial neural systems according to presently available technology.
The degree of connectivity and the real-time nature of neural processing demand different approaches to the problem of inter-chip communication than those used in traditional digital computers. VLSI designers have adopted several strategies for inter-chip communication in silicon neural networks. Each strategy has unique advantages and the choice of method depends on which factors are most crucial to the system.
One of the most literal approaches to interconnecting processing nodes has been adopted by Paul Mueller's group at the California Institute of Technology. Mueller uses a direct physical connection between nodes on different chips thorough a cross-bar switching array, One major advantage to this approach is that it allows continuous-time communication between nodes. In addition, the switching arrays provide flexible connectivity and can be programmed digitally by a host computer. The system is able to handle large connectivities because the dendrites of a single artificial neuron can be extended over multiple chips.
However, this approach requires many chips to model even a small number of neurons. The number of artificial neurons on each output chip is limited to roughly half the number of pins that are available. Currently available technology supports 84 pin grid arrays, and in the near future may be expected to be extended to 128. A further disadvantage of this design is that, in order to achieve a reasonable degree of matching between the analog performance of the different chips in the system, the transistors are used in their above threshold regime, where power dissipation is great.
Some applications, such as sensory transduction in which the silicon surface acts as a sensory epithelium, require many neurons to comprise locations on the same chip. The total number of neurons in such a structure exceeds the number of pins available for transmitting their outputs to off-chip targets. In this case, continuous time communication is sacrificed in order to time-multiplex the outputs of many neurons onto the small number of wires. The outputs of each neuron is sampled and transmitted for a brief time. The speed at which data can be transmitted determines the frequency above which information will be lost due to temporal aliasing.
Traditional multiplexing comprises serial access schemes. Each node is polled in sequence and its output sent off-chip. Each time slot is allocated to a particular node and the receiving device must by synchronized with the sending device in order to preserve the identity of the transmitting node. Most multiplexing schemes rely on a global clock to perform this synchronization. Global clock signals may be skewed to the point of dysfunction if the chips comprising the system are too far from each other.
The choice of multiplexing technique depends on how the neural elements in the system encode information. Some systems use analog-valued outputs, which encode several bits of information on a single wire. In analog multiplexed systems, the receiver chip samples the data stream and holds the data in a buffer until the next frame. This approach is particularly useful for interacting with video equipment as such equipment is designed to work with analog-valued image frames. However, analog data transfer is difficult between chips, in part because the analog data is easily perturbed by noise due to multiplexing. More importantly, the variations in the parameters of fabrication on different wafers means that different chips will have disparate interpretations of analog currents or voltages. These difficulties are avoided by transmitting digital amplitude signals.
Both synchronous and asynchronous techniques have been used to time-multiplex digital amplitude data. Digital signal transmission can be very fast because the settling time for an analog amplifier is avoided. Furthermore, digital signals are noise resistant and independent of variations in fabrication parameters. Synchronous transmission of multiple bits of information has the drawback that synchronously switching many elements injects noise on the power supply.
Asynchronous serial digital communication methods in which the duration of the digital pulse encodes several bits of information have been used. In this approach, the duration of the pulse is inversely proportional to the analog value of the output. Rather than using a global clocking mechanism to allocate specific time-slots to particular nodes, the identity of the sending neuron is determined by its position in the pulse stream. The node position is computed from the number of transitions in the stream itself. The pulse stream provides its own clock. The pulse stream techniques uses time to encode analog state, rather than to communicate explicitly temporal information.