An Instruction Set Architecture (ISA) is the representation of an underlying computer architecture used by a programmer to realize application goals. The ISA exports the sum total of capabilities of a computer to the programmer and embodies the way it is intended to be used in a collection of instructions accessible to a programmer. A user (and/or machine) may use ISA instructions to make computer programs, through the use of a programming language, such as, but not limited to, Assembly language.
Despite the addition of performance-improving features such as super-scalar processing and caching, the programmer's mental model of a processor has changed little in 40 years. Therefore, the way computers are programmed has not changed significantly either. Unfortunately, the stalling of frequency scaling in the early twenty-first century, and the resultant shift from faster gates to more parallel gates has not resulted proportionally to increased performance, at least in part because the sequential programming style useful in Instruction-Level Parallelism (ILP) exploiting super scalar processors is at best ineffective in an explicitly parallel context, and at worst detrimental.
As parallel shared memory computers are becoming larger, incorporating tens of independent super-scalar cores, latency in interconnection networks is becoming the dominant factor limiting performance. Computer architects are attempting to mitigate the effects of latency by some existing approaches:                a) Adding instructions to prefetch data or have more explicit control over the behavior of the memory sub-system.        b) Increasing the number and proximity of caches to the execution core.        c) Increasing threading, thereby hiding the effects of latency.        
The existing approaches are the result of an industry acceptance that electrical interconnect latency is an intractable problem. Existing approaches are deficient both in efficiency and in energy utilization. Yet another deficiency of existing approaches is that processing hardware is not synchronized over long distances. In existing approaches, the resulting lack of synchronization makes highly efficient parallel programming of many-core architectures difficult, and highly dependent upon architectural parameters.