Electronic devices and systems have made a significant contribution towards the advancement of modern society and are utilized in a number of applications to achieve advantageous results. Numerous electronic technologies such as digital computers, calculators, audio devices, video equipment, and telephone systems have facilitated increased productivity and reduced costs in analyzing and communicating data, ideas and trends in most areas of business, science, education and entertainment. These advantageous results are often realized through the use of information processing and the speed at which the information is processed is often critical. Advances in parallel processing system configurations often offer the potential to significantly increase the speed at which the information is processed. However, advances in processing system configurations typically require special complex programs to be written for the new systems.
Traditional information processing applications usually contain specialized information processing instructions. The processing information (e.g., data and instructions) is usually stored in a memory. The configuration of the memory and access to the information is critical for proper implementation of application programs. One type of conventional memory configuration is shared memory. In a shared memory configuration, multiple processors share access to a single pool of memory. This type of processing configuration system is relatively simple and allows programs to be written in a traditional sequential format. A single shared memory controller ensures data consistency between sub processes running on different processors. Conventional parallel compilers can decompose a sequential program to run portions of the program in parallel. For example, a do loop working on an array can be partitioned over multiple processors by a conventional compiler. However, shared memory applications can not typically scale well to large multiple processing systems.
Advances in processor fabrication technology typically support faster processors with greater processing bandwidth at cheaper prices. The faster inexpensive processors enable significant scaling of multiple processor approaches in computer system architectures. For example, clustered computer systems incorporate multiple processing nodes in which each node typically has a processor and an associated memory. The multiple processors and memories in the system offer potentially significant speed increases through distribution and parallelization of sub-process operations.
Numerous applications naturally lend themselves to division into sub-processes suitable for multi-processing architectures such as clusters. For example, applications that search for patterns in large data sets can usually be broken down into multiple searches across subsets of the data. Each sub-problem can be solved independently and the results easily combined. However, many important applications such as modeling of fluid flows and other complex scientific applications are not readily decomposed. These applications often evolve over decades of use and represent major investments. Changes in programming languages, tools, libraries, and models, as well as hardware platform changes, usually involve a significant cost as program software codes are ported and/or rewritten. Traditional attempts at achieving parallelism for these applications often involve large scale and/or distributed systems. However, conventional large scale and/or distributed system approaches often encounter significant coordination and synchronization problems.
Message passing has emerged as a primary programming model used to attempt coordination of various application processing in large scale systems. Message passing attempts to provide high performance on large-scale systems, but usually require significant proper application restructuring. However, inconsistent performance, non-incremental high programming and maintenance costs have limited the advancement and spread of MPI applications. On the other end of the spectrum, shared memory parallelism is often exploited using OpenMP (e.g., C or FORTRAN with extensions to support loop and task parallelism). However, OpenMP is limited to the scalability of the SMP node.