Field of the Invention
The invention belongs to the field of microelectronic devices, and in particular to a processor integrating memristor-based computing and memory and to a method for using the processor.
Description of the Related Art
Traditional computers adopt the von Neumann architecture. The storage and computing units of the von Neumann architecture are separated from each other and respectively completed by arithmetical units in memories and central processing units (processors). As the semiconductor technology improves constantly, the performance of processors and memories has improved greatly. According to the Moore's Law, the number of the transistors in a microprocessor doubles every 18 months; and the annual growth rate of processor performance once exceeded 50% but the average annual growth rate of memory performance is just 7% and doubles about every ten years. Let's take Intel processors as an example. From 1980 to 2006, the clock rate of processors increased by about 3500 times but the access time of DRAM only decreased by about 6 times. Since the growth rates of storage technology and processor technology are imbalanced, the gap between the two growth rates keeps growing larger. Therefore, it takes a quite long time for processors to read stored data, which causes the problem of Memory Wall. This problem has become a bottleneck that prevents the further improvement of the overall performance of computer systems.
The performance difference between processors and memories is a problem that can't be solved in a short time. To reduce the influence of storage and access on processor performance is one of the main challenges in current processor architecture design. As the integrated level of single chip transistors becomes higher and higher, the problems such as power consumption, line transmission delay and leakage current get worse increasingly. It has already become very difficult to improve processor performance only by improving basic frequency. However, the return on investment of the instruction-level parallelism adopting the traditional superscale and speculative technological development becomes lower and lower. Therefore, the development of higher-level thread-level parallelism and task parallelism has become the inevitable trend in constantly improving processor performance; and the advanced architecture represented by multi-core processors has become the main trend of the processor development. The multi-core design is the dominant idea in the current high-performance computing field and has been used in many fields such as servers, laptops, game platforms and high-performance multi-media applications. The on-chip multi-core architecture which integrates multi-microprocessor cores into a chip and the multi-core and multi-threaded architecture which adopts the multi-threaded technology both effectively utilize on-chip transistor resources and provide users with multi-threaded execution capacity and high productivity computation. The multi-core architecture is an effective way to further improve processor performance while complying with the Moore's Law and utilizing the limited chip area. How to carry out and further optimize the multi-core design has become the key point of research in the academic and industrial circles recently.
The memory of a traditional single-processor chip only needs to provide data for one processor. However, as for a multi-core chip, its memory needs to provide data for multiple processor cores. At present, according to the Moore's Law, the number of the cores of a multi-core processor increases but the memory bandwidth of the processor is subject to the number of chip pins and almost doesn't increase. Moreover, the mutual access interference between threads of the multi-core processor further leads to the increase of access request delay. These changes worsen the current Memory Wall problem. When the memory bandwidth remains the same, the scale of problem increases with time, and the program execution time also increases exponentially with time. Therefore, in the predictable future, the storage system will still be the largest problem for computer system designers.
As for this problem, we hope that we solve this problem by changing computer hardware. Memristors are the next generation of nonvolatile memories. A memristor can realize reversible transformations between high resistance and low resistance under electric pulses. High resistance and low resistance can be used to represent and store “0” and “1”. High resistance represents “0” and low resistance represents “1”. This is different from the traditional “electrical level” logic. The traditional “electrical level” logic uses high and low voltages to represent “0” and “1”, by which circuit states can't be stored after power failure. In order to guarantee nonvolatile storage, a storage state needs to be adopted. The state we adopt is the state of resistance. Therefore, since we consider that the resistance property of memristors participates in the completion of logic computation and the resistance states of memristors are used to store computation results when we design circuits, information can still be stored when the power is cut off. Therefore, the step that the traditional architecture outputs the computation results to memories is omitted, and the integration of Computing and Memory is realized.
In 2010, HP Labs published an article in the journal Nature and put forward that the future nonvolatile logical operation of states will replace the existing logical operation of electrical level. It uses two memristors and one resistance to realize the (NOT p) OR q logic of Material Implication (IMP). Logic states are all stored in memristors in the form of resistance in a nonvolatile way. The integration of storage and computation is realized in memristors for the first time.
Implication operation needs to use a resistor RG (RON<<RG<<ROFF) which is connected to two memristors P and Q in parallel. The initial values of p and q are stored in the memristors P and Q. The voltages VCOND and VSET are applied to P and Q respectively. The VCOND applied to P is less than the threshold voltage so the state of P won't be changed. When P is in a state of high resistance (logic 0), since ROFF>>RG, the voltage of RG is almost equal to 0. Therefore, the voltages of the two ends of Q are VQ≈VSET. At the moment, no matter what state Q was in before, Q will be in a state of low resistance (logic 1). When the state of P is in a state of low resistance (logic 1), since RG>>RON, the voltage of RG is almost equal to VCONG. Therefore, the voltages of the two ends of Q is VQ≈VSET−VCOND which is less than the threshold voltage and won't change the state of the memristors. Therefore, Q remains the original state. In other words, that is q'←pIMPq as shown in FIGS. 1A and 1B.
There are two main technical ways to solve the Memory Wall problem. The first way is to improve memory performance fundamentally but likely there will be no effective techniques and means to improve memory performance in a short time. The second way is to rely on the rapid development of micro-electronic technology, change computer architecture and optimize computer hardware to solve the Memory Wall problem.