Today's most advanced computers are used to model physical systems, such as, for example, the folding of a protein or the circulation of the global climate, but they are also physical systems themselves. The demands of high-performance computing have driven the frontiers of device physics from vacuum tubes to semiconductor heterostructures. Between these simulated and physical realities lie many layers of abstraction: materials are assembled into devices, devices into circuits, circuits into boards, boards into cases, cases into racks, and racks into systems, and, in corresponding layers of software, applications are implemented in algorithms, written in a high-level language, compiled into microcode, scheduled by an operating system, and then executed by processors.
Most computer science programming models hide the underlying physical reality of computation, and the corresponding layers of software serve to insulate programs and programmers from knowledge of the physical construction of the computer. This division of labor is now being challenged by the growing complexity of computing systems. While device performance has been improving exponentially for decades and has a firm future roadmap [Paul S. Peercy, “The Drive to Miniaturization”, Nature (406), pp. 1023-26 (2000)], this has not been true for software. Rather, cost overruns, shipping delays, and bugs have been recurring features of development efforts ranging from taping out chips to delivering operating systems. Along with programmer productivity, system scaling obstacles include interconnect bottlenecks and prohibitive power requirements.
As information technologies scale down in device size and up in system complexity, their computational and physical descriptions converge as the number of information-bearing degrees of freedom becomes comparable to the number of physical ones. It is already possible to store data in atomic nuclei and to use electron bonds as logical gates [N. Gershenfeld and I. Chuang, “Bulk Spin Resonance Quantum Computation”, Science (275), pp. 350-356 (1997)]. In such a computer, the information-bearing degrees of freedom are the same as the physical ones, and it is no longer feasible to account for them independently. The universe executes in linear time, independent of its size. A scalable computer architecture must similarly reflect the scaling of its contents. An explicit description of the spatial distribution, propagation, and interaction of information in a computer program offers portability across device technologies (which must satisfy the same physical laws), scalability across machine sizes (because physical dynamics are inherently parallel), and simplification of fabrication (since causality implies locality).
The performance of a computer is limited by the bandwidth and latency of the connection between where data is stored and where it is processed. Early computers were far more limited by speed and availability of processing and memory than the performance of the connections between them. Von Neumann or Harvard-style computer architectures, where for each cycle data is transmitted to and manipulated in a central processing unit, are well suited for computers built from slow and expensive processing elements (i.e. vacuum tubes) and comparatively fast and cheap communication (wires). However, faster modern building blocks (smaller transistors, improved logic families, and other emerging technologies) have outpaced the rate that data can be fetched from memory. The operating speeds of many modern computers are beyond even the relativistic limits for data to be retrieved from an arbitrary location in a single cycle. In modern computers, it can take hundreds or even thousands of cycles to fetch a piece of data. There are a wide variety of techniques that have been developed to anticipate what data will be needed and load it ahead of time (pipelining, caching, instruction reordering, branch prediction, speculative execution, etc.), but the availability and behavior of these features can vary widely from processor to processor as can their effectiveness with different program behaviors. Although the Von Neumann abstraction is a familiar model of computation, in order to write software that takes advantage of the aggressive performance possible with modern (and future) technologies, fundamentally different models of computation will be needed, as well as computer architectures that can efficiently run them.
Information in physics is an extensive quantity. Like mass, it scales with the system size. For a computer to do the same, it must be uniform, unlike the inhomogeneous combinations of processors, memory, storage, and communications that are the norm today. For this reason, cellular architectures have long been attractive as a model for computation [J. von Neumann, “Theory of Self-Reproducing Automata”, edited by A. W. Burks, Univ. of Illinois Press (Urbana, 1966)], and more recently for its implementation [M. Sipper, “The Emergence of Cellular Computing”, Computer (32), pp. 18-26 (1999)]. “Cellular Automata” was originally a discrete model in which space, time, and states were discretized, and update rules were carefully designed for studying complex phenomena [Neil Gershenfeld (1999), “The Nature of Mathematical Modeling”, Cambridge, UK: Cambridge University Press]. Cellular automata were found to be quite successful in modeling physical interactions governed by differential equations in a continuum limit, such as, for example, lattice gases for hydrodynamics [U.S. Pat. No. 6,760,032; U. Frisch, B. Hasslacher, and Y. Pomeau, “Lattice-Gas Automata for the Navier-Stokes Equation”, Phys. Rev. Lett. (56), pp. 1505-1508 (1986)] and spin dynamics [E. Domany and W. Kinzel, “Equivalence of Cellular Automata to Ising Models and Directed Percolation”, Phys. Rev. Lett. (53), pp. 311-314 (1984)]. Because of this great potential of computing as a physical system, cellular automata present a practical architecture for computation [N. Margolus, “Physics-Like Models of Computation”, Physica D (10), pp. 81-95 (1984)].
Relevant research in the 1970s demonstrated that universal Boolean logic could be implemented in cellular automata with one-bit states and just three local rules [R. E. Banks, “Information Processing and Transmission in Cellular Automata”, Ph.D. thesis, MIT (1971)]. The Banks Boolean cellular automata has only three rules, acting in 2D on one-bit states with 4 rectangular neighbors. The simplicity in primitive functioning unit, however, led to complexity in the implementation of wires and gates. In such a system, the logic functions are distributed, requiring many cells to realize them. The generality of a cellular automata's rule table allows many other behaviors to be modeled, such as hydrodynamics or graphics. Many more variants of cellular automata models/applications [see, e.g., U.S. Pat. No. 6,910,057] and hardware implementations [see, e.g., U.S. Pat. No. 7,509,479; U.S. Pat. No. 5,243,238] have been proposed. All of these implementations are based on Boolean logic.
If the goal is just computation, then this can be implemented more compactly in “logic automata” in which every cell can contain a logic gate and store its state, locally providing the interactions needed for computational universality. Logic automata are a subset of cellular automata [N. Gershenfeld, The Nature of Mathematical Modeling, Cambridge University Press, 1999] and quantize space and time with distributed cells connected locally, each performing a basic logic operation. Logic automata are therefore scalable, universal for digital computation [R. E. Banks, Information Processing and Transmission in Cellular Automata, Ph.D. thesis, Massachusetts Institute of Technology, 1971], and reflect the nature of many complex physical and biological systems [D. A. Dalrymple, N. Gershenfeld, and K. Chen, “Asynchronous logic automata,” Proceedings of AUTOMATA 2008 (14th International Workshop on Cellular Automata), pp. 313-322, June 2008; L. O. Chua, “CA belongs to CNN,” invited talk at AUTOMATA 2008 (14th International Workshop on Cellular Automata), June 2008]. Logic automata form a family of computer architectures that expose a cartoon version of physics that is easy for a programmer to work with but maintains the underlying physical relationship between the size of logic elements, their computation rates, and signal travel speeds. This allows programmers to work with abstractions that will have well defined behavior for both correctness and performance, regardless of which underlying technology is used to fabricate it.
Analog logic circuits are a class of analog circuits for statistical signal processing, in which an associated inference problem is dynamically solved by locally propagating probabilities in a message-passing algorithm [U.S. Pat. No. 7,209,867; H.-A. Loeliger, F. Lustenberger, M. Helfenstein, and F. Tarkoy, “Probability propagation and decoding in analog VLSI”, IEEE Trans. Inform. Theory, 47:837-843, Feburary 2001; Benjamin Vigoda, “Analog Logic: Continuous-Time Analog Circuits for Statistical Signal Processing”, PhD thesis, Massachusetts Institute of Technology, June 2003; Xu Sun, “Analogic for Code Estimation and Detection”, M. Sc thesis, Massachusetts Institute of Technology, September 2005]. From the mathematical optimization point of view, the inference problem is a special kind of mathematical optimization problem with constraints that include axioms of probability theory. Although the question of how to construct physical systems to solve the inference problem with various combinations of very low power, extremely high speed, low cost, and very limited physical resources is still an open research topic, message-passing algorithms [Hans-Andrea Loeliger, “Introduction to factor graph”, IEEE Signal Processing Mag., January 2004; F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sumproduct algorithm”, IEEE Trans. Inform. Theory, 47:498-519, February 2001; Yedidia, J. S. and Freeman, W. T. and Weiss, Y., “Constructing free-energy approximations and generalized belief propagation algorithms”, IEEE Transactions on Information Theory, vol. 51(7), pp. 2282-2312, July 2005] approach this question by locally passing messages on a factor graph. The messages can be mapped into physical degrees of freedom like voltages and currents. The local constraints on the factor graph are the computation units implemented by a class of analog statistical signal processing circuit.
Digital computation avoids and corrects errors by sacrificing continuous degrees of freedom. Analog logic circuits recover this freedom by relaxing the digital states, with each device doing computation in the analog domain, and only quantizing at the output [B. Vigoda, Analog Logic: Continuous-Time Analog Circuits for Statistical Signal Processing, Ph.D. thesis, Massachusetts Institute of Technology, June 2003]. The analog representations come from either describing digital (binary) random variables with their probability distributions in a digital signal processing problem, or from relaxing binary constraints of an integer programming problem. The preserved information from this analog computation scheme for digital problems gives rise to robust, high-speed, low-power, and cost-effective hardware. Circuit realization examples include decoders [H.-A. Loeliger, F. Lustenberger, M. Helfenstein, and F. Tarkoy, “Probability propagation and decoding in analog VLSI,” Information Theory, IEEE Transactions on, vol. 47, no. 2, pp. 837-843, February 2001] and the Noise-Locked Loop (NLL) for direct-sequence spread-spectrum acquisition and tracking, which promise order-of-magnitude improvement over digital realizations [B. Vigoda, J. Dauwels, M. Frey, N. Gershenfeld, T. Koch, H.-A. Loeliger, and P. Merkli, “Synchronization of Pseudorandom Signals by Forward-Only Message Passing With Application to Electronic Circuits,” Information Theory, IEEE Transactions on, vol. 52, no. 8, pp. 3843-3852, August 2006].
In a Noise-Locked Loop (NLL) for synchronization to a Linear Feedback Shift Register (LFSR) [U.S. Pat. No. 5,612,973; U.S. Pat. No. 5,729,388; Benjamin Vigoda, “Analog Logic: Continuous-Time Analog Circuits for Statistical Signal Processing”, PhD thesis, Massachusetts Institute of Technology, June 2003; Xu Sun, “Analogic for Code Estimation and Detection”, M. Sc thesis, Massachusetts Institute of Technology, September 2005], the application first formulates a particular decoding synchronization problem into a statistical inference problem, and then applies local message-passing algorithm as the solution to the problem. With a proper representation of these statistical binary variables, the implementation of the message-passing algorithm can be reduced to a series of operations of multiplication and summation. Thus, a hardware realization can be built with a Gilbert Multiplier, exploiting the well-known Translinear principle. This work shows that NLL implemented in analog logic can perform direct-sequence spread-spectrum acquisition and tracking functionality and promises orders-of-magnitude win over digital implementations. These analog logic circuits are, however, custom and special-purposed, and no reconfigurable analog logic has yet been reported.