In recent years several device solutions have been proposed for fabricating nano-scale programmable resistive elements, generally categorized under the term “memristor.” Of special interest are those which are amenable to integration with state of art CMOS technology, like memristors based on Ag—Si filaments. Such devices can be integrated into metallic crossbars to obtain high density resistive crossbar networks (RCN; also referred to as “resistive crossbar memory,” RCM).
FIG. 1 is a perspective of an exemplary resistive crossbar network 100. It includes row electrodes 110 and column electrodes 120, e.g., metal bars fabricated on an integrated circuit. Memristors 130 (e.g., Ag—Si) are arranged at the intersections of row electrodes 110 and column electrodes 120. Any number of row electrodes 110, column electrodes 120, or memristors 110 can be used. For each row i and column j, the memristor 130 in row i, column j has conductivity gij, interconnecting the ith row electrode 110 and the jth column electrode 130. Multi-level write techniques known in the art for memristors can be used to store information in the memristors 130. In an example, 3% write accuracy (equivalent to 5-bits) is used for the memristors 130.
The substantially continuous range of resistance values obtainable in memristors 130 can facilitate the design of multi-level, non-volatile memory. The RCN technology permits combining memory with computation. RCNs can be used for a large number of non-Boolean computing applications that involve pattern-matching. Note that, the class of non-Boolean pattern-matching computations, a prospective application of RCN technology, is inherently approximate and have relaxed precision constraints. Such applications employ memory-intensive computing that can involve correlation of multidimensional input data with a large number of stored patterns or templates, in order to find the best match. Use of conventional digital processing techniques for such tasks incurs prohibitively high energy and real-estate costs, due to the number of computations involved. RCNs can be used for this class of associative computation. Owing to the direct use of nano-scale memory array for associative computing, RCNs can provide a very high degree of parallelism, and can reduce or eliminate the overhead due to memory reading.
Associative computing with RCNs is largely analog in nature, as it involves evaluating the degree of correlation between inputs and the stored data. As a result, many prior schemes for associative hardware using RCNs perform processing using, e.g., analog CMOS circuits or analog operational amplifiers (for current-mode processing). However, use of multiple analog blocks for large scale RCNs may lead to high static power consumption. This can eclipse the potential energy benefits of RCN for non-Boolean computing. Moreover, with technology scaling, the impact of process variations upon analog circuits becomes increasingly more prominent, resulting in lower resolution for signal amplification and processing. This limits scalability of analog approaches. Hence, the conventional analog circuits may fail to exploit the RCN technology for energy efficient, non-Boolean computing.
A prior scheme for finding data in an RCN correlated with a test input involves a digital or mixed-signal CMOS “winner-take-all” (WTA) circuit. The RCN provides correlation values between stored vectors and input data. The WTA is used to identify the maximum (or minimum) among a the correlation values. WTAs are used in some pattern matching applications to find the maximum (minimum) among the outputs of a distance-evaluation matrix.
FIG. 2 shows a prior mixed-signal CMOS winner-take-all (WTA) circuit fed by an RCN 210 to find the maximum of N inputs, each with m-bit precision. Memristors are labeled gmn. Input stage 220 (details shown in the inset) buffers currents from the column lines of RCN 210 using regulated current mirrors. This provides low input-impedance and a near constant DC bias to the RCN 210. Exemplary known WTA circuits include current-conveyer WTA (CC-WTA), and binary tree WTA (BT-WTA), the later being more suitable for large number of inputs. BT-WTA employs a binary tree of 2-input comparison stages which involve copying and propagating the larger of the two current inputs to the output. Shown is BT-WTA tree 230.
Tree 230 includes approximately N pairwise comparators 240 (“WTA-2”; details shown in the inset). Schemes using digital rather than current comparisons require each comparator 240 be an m-bit comparator. Comparators 240 are arranged in a binary tree structure. Each comparator 240 computes the winner (larger or smaller) between two of its input and passes the larger (smaller) value to the next stage of nodes. With increasing number of inputs to the WTA, the numbers of stages and nodes in the binary WTA tree 230 increase, leading to larger delay and area. Therefore the area required increases steeply, and the time required also increases steeply.
In general, the use of such analog WTA circuits leads to large static power consumption. In fact, the power consumption of an analog WTA unit can be several times larger than the RCN itself. Moreover, the performance of such current-mirror based circuits is limited by random mismatches in the constituent transistors and other non-idealities, e.g., channel length modulation, that introduce mismatch in different current paths. In order to maintain a sufficiently high resolution, larger transistor dimensions (both length as well as width) and hence, larger cell area is needed. This is evident from some recent designs that used scaled technology but with long channel lengths. This leads to increased parasitic capacitances and thus lower operating frequency for a given static power. Higher frequency and resolution can be achieved at the cost of increased input currents and thus larger power consumption. Special techniques to enhance the precision of current mirrors have been proposed in literature, but they introduce significant overhead in terms of power consumption and area complexity. Voltage-mode processing can also be employed in an RCN; however, it can incur additional overhead due to current to voltage conversion and subsequent amplifications. This can incur larger mismatch, non-linearity and power consumption. Digital processing can also be used by placing analog-to-digital converters (ADCs) in the input stage 220, but a full tree 230 of m-bit digital comparators 240 is then required, at considerable cost in area. Accordingly, conventional mixed-signal CMOS design techniques may not be able to leverage the emerging nano-scale resistive memory technology for memory based computing.
There is, therefore, a need of an improved WTA circuit and improved ways of comparing and storing values.