Computers are machines that process data according to instructions. Today, they are mostly configured such as to distribute their work across several CPUs, providing multiprocessing capabilities. Multiprocessor and multi-core systems are now available for personal and laptop computers and not anymore restricted to supercomputers, mainframe computers or servers. Yet, the largest computers still benefit from unique architectures that significantly differ from the usual computers. For instance, they often feature thousands of processors, high-speed interconnects, and specialized hardware.
Be it in a multiprocessor context or not, a challenge for computer systems is to improve their global performances and this, while reducing aggregate power consumption. Besides, most CPUs today tend to spend time waiting for memory, I/O, graphics, etc., such that that improving the sole CPU instruction execution performances is not anymore the main possible axis of development.
For instance, a paper of Brown, J. A. and Tullsen, D. M (The shared-thread multiprocessor. In Proceedings of the 22nd Annual international Conference on Supercomputing (Island of Kos, Greece, Jun. 7-12, 2008). ICS '08. ACM, New York, N.Y., 73-82. DOI=http://doi.acm.org/10.1145/1375527.1375541), describes results for an architecture of shared-thread multiprocessor (STMP). The STMP combines features of a multithreaded processor and a chip multiprocessor. Specifically, it enables distinct cores on a chip multiprocessor to share thread state. This shared thread state allows the system to schedule threads from a shared pool onto individual cores, allowing for rapid movement of threads between cores. The paper demonstrates and evaluates benefits of this architecture.
Other approaches focus on:
Multiple processors integrated into structure of memory array, see e.g. Duncan G. Elliott, W. Martin Snelgrove, and Michael Stumm. Computational RAM: A Memory-SIMD Hybrid and its Application to DSP. In Custom Integrated Circuits Conference, pages 30.6.1-30.6.4, Boston, Mass., May 1992;
Multiple processors and memory macros integrated onto a chip (PIM), see e.g. Maya Gokhale, Bill Holmes, and Ken Iobst. Processing in Memory: the Terasys Massively Parallel PIM Array. Computer, 28(3):23-31, April 1995;
Multiple processors and memory macros integrated onto a chip (Execube), see e.g. Peter M. Kogge. EXECUBE—A New Architecture for Scalable MPPs. In 1994 International Conference on Parallel Processing, pages 177-184, August 1994; and
IRAM, see e.g. David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberly Keeton, Christoforos Kozyrakis, Randi Thomas, and Katherine Yelick. Intelligent RAM (IRAM): Chips that Remember and Compute” Presented at the 1997 IEEE International Solid-State Circuits Conference (ISSCC) 6-8 Feb. 1997, San Francisco, Calif.