Since von Neumann and others more than 60 years ago developed a stored program electronic computer, the fundamental memory accessing principle has not been changed. While the processing speeds of computers have increased significantly over the years for whole range of high performance computing (HPC) applications, these accomplishments were either by device technology or by methods that avoid memory accessing, such as using cache. However, memory accessing time still remains a limit on performance.
Currently computer systems use many processors 11 and many large-scale main memories 331 shown. The computer system shown in FIG. 1 includes a processor 11, a cache memory (321a, 321b) and a main memory 331. The processor 11 includes a control unit 111 having a clock generator 113 that generates a clock signal, an arithmetic logic unit (ALU) 112 that executes arithmetic and logic operations synchronized with the clock signal, a instruction register file (RF) 322a connected to the control unit 111 and a data register file (RF) 322b connected to the ALU 112. The cache memory (321a, 321b) has an instruction cache memory 321a and a data cache memory 321b. A portion of the main memory 331 and the instruction cache memory 321a are electrically connected by wires and/or buses, which limit the memory access time or have the Von Neumann bottleneck 351. The remaining portion of the main memory 331 and the data cache memory 321b are electrically connected to enable a similar memory access 351. Furthermore, wires and/or buses, which implement memory access 352, electrically connect between the data cache memory 321b and the instruction cache memory 321a, and the instruction register file 322a and the data register file 322b. 
Even though HPC systems operate at high speed and low energy consumption, there are speed limitations due to the memory accessing bottlenecks 351, 352. The bottlenecks 351, 352 are ascribable to the wirings between processors 11 and the main memory 331, because the wire length delays and stray capacitance existing between wires cause additional delay in access to the computers. Additionally, stray capacitance requires more power consumption that is proportional to the processor clock frequency in 11.
Some HPC processors use vector arithmetic pipelines. These vector processors display improved memory bandwidth for HPC applications that can be expressed in vector notation over more conventional HPC processors. The vector instructions are made from loops in a source program and each vector instruction is executed in an arithmetic pipeline in a vector processor or corresponding units in a parallel processor. The results of either of these processing methods give the same results.
However, in spite of the improved memory bandwidth, the vector processor based system still has the limiting memory bottleneck 351, 352 between all the units. Even in a single system with a wide memory and large bandwidth, the same bottleneck 351, 352 appears, and in systems employing many of the same units, as in a parallel processor, the bottleneck 351, 352 is unavoidable.
There are two essential memory access problems in conventional computer systems. The first problem is wiring between memory chips and caches, including where these two units are on a single chip and the wiring inside memory systems themselves. The wiring between chips results in a dynamic power consumption due to capacity and the wire signal time delay. This power consumption is extended to the internal wire problems within a memory chip, related to access lines and the remaining read/write lines. Thus in both inter and intra wiring of memory chips, wasteful energy consumption is caused by the capacitance of these wires.
The second problem is the memory bottleneck 351, 352 between the processor chip, cache and memory chips. Since the ALU can access any part of cache or memory, the access path 351, 352 consists of global wires of relatively long length. However, these paths are limited in the number of wires available. Such a bottleneck is often attributed to hardware such as busses. Therefore, when using a high speed CPU and a large capacity of memory, the most common bottleneck occurs between these two.
There are two approaches that can be used to address the bottleneck problems and create improved memory access. The first is to match the memory clock cycle to the CPU's clock cycle. The second is to reduce the time delay caused by longer wires both inside memory and outside memory.
By solving these two issues, a fast, direct coupling between memory and the CPU is possible without the memory bottleneck. As shown in FIG. 53, the processor and periphery of the processor consume 70% of the total energy because of these problems, which is divided into 42 percent for instruction supply and 28 percent for data supply shown. Therefore, the wiring problems generate not only power consumption but also time delay of signals. By eliminating the bottlenecks 351, 352 through removal of the wirings in the intra/inter chips, the problems of power consumption, time delay and memory bottlenecks 351, 352 would be solved.