1. Motivation
Since the dawn of computer age in the 1950's, we have seen remarkable progress in computer hardware systems design and computer software systems design. This progress has now led us to where we are today. We have taken an unimaginably great leap from the days of our ancestors who lived in the tropical jungles of Africa 100 thousand years ago. We now enjoy and suffer unprecedented use of computers in every aspect of human life, unprecedented communication among peoples via internet, as also unprecedented opportunities for invasions of privacy, data stealing and cyber warfare.
There is now a growing gap between our ability to produce complex secure computing hardware systems at ever decreasing costs, and our ability to produce software systems that exploit available hardware configurations, at ever increasing costs. Cost of producing and maintaining software systems far exceed costs for hardware systems; at more than 12 trillion dollars per year around the world, cost of software systems is more than five times the cost of hardware systems. In spite of this enormous cost, software systems are intrinsically error prone, prone to attacks by intruders and cannot guarantee data privacy and system security. This has given rise to a vast industry catering to data and system protection from potential intruders.
Our computing technology has reached an impassable computing bottleneck. We will have several occasions to refer back to this computing bottleneck later in this chapter.
As software systems keep getting progressively more complex, they also keep getting progressively more error prone and more prone to attacks by hackers. Complexities of software systems design, implementation and maintenance, and our inability to deploy viable technological solutions to solve problems of unreliability, lack of security and ever increasing production costs, have led us to this bottleneck. Much of the advances in software engineering technology have been driven by corresponding advances in programming technology, and not advances in system design independent of programming. Presently, there does not seem to be any way out of this vicious cycle of programming and system implementation and the computing bottleneck it pushes us in to.
Almost all software systems we use today fall into the category of time sliced concurrent software systems. By “time sliced concurrent” we mean software systems in which the same CPU, performs multiple computations in multiple distinct non-overlapping time slices, controlled and coordinated by a monitor, no two computations being ever performed by any CPU simultaneously at the same time. As number of time slices per second increase, these computations become intrinsically unreliable and unpredictable.
With multicore technology we have today, and new technologies available for very fast parallel inter process communications [U.S. Pat. No. 7,210,145 B2], there is hope that parallel software systems may provide viable solutions to cater efficiently to ever increasing software complexity, lack of reliability and security and increasing costs. By “parallel software” we mean software systems in which multiple distinct computations in multiple ComPutinq Units (CPUs) occur simultaneously at the same time and the CPUs automatically coordinate their activities through communication with each other, with no need to use monitors.
However, there is little evidence that this hope could be realized using current technologies of parallel program design and development. This is because, current parallel processing technologies have serious limitations: parallel software systems that use current technologies are much harder to implement, even harder to debug and validate, and harder still to update and maintain, and do not run efficiently making optimal use of available hardware resources, and the gap between computation timings and communication timings is vast and growing ever wider in spite of all the advances we have made in communications technology.
There have been several discussions of different paradigms for parallel programming, based on languages and libraries used [1], based on different message passing interfaces [2]-[6], based on data flow models [7][8], based on different hardware architectures [8]-[10], based on network models [10]-[12] and based on communication models [π-calculus, 26,27]. All of these accept the inevitability of two fundamental incompatibilities and one serious limitation:                (i) Incompatibility between communication and computation speeds: this is usually compensated by increasing grain size of computations, grain size being the average amount of time spent on computations between successive message passing events in parallel processing systems.        (ii) Incompatibility between speed of data access from memories and CPU speeds; this is usually compensated by using cache memories [13], pipe-lined instruction execution [14], look-ahead instruction scheduling [15] and multiple instruction stream executions [16]. However, these make it impossible to predict instruction execution times. Also, when very high speed communication facilities are used, they cause cache incoherence, which makes different parallel execution units to see at the same time different data values for the same data item.        (iii) Difficulty of debugging parallel programs, updating them and maintaining them: Errors in parallel programs may cause them to deadlock and grind to halt, making them difficult to debug. Using parallel break points will disrupt execution timings and themselves lead to deadlocks. It is not possible to dynamically update and re-validate current parallel programs.        
Thus, parallel programs become very difficult and very costly to develop and maintain, requiring special expertise not commonly available. One seeks higher productivity [17][18] by increasing peak flops/sec yields of parallel processors without having to increase parallel program execution efficiencies. Commodity based massively parallel cluster computing will find its limits in realizable flops/sec efficiency (which is currently about 30%), realizable flops/unit-of-power efficiency and flops/unit-of-space efficiency measures. These efficiencies are likely to decrease dramatically at scaling factors of 104 or 106. It is claimed, Blue Gene (IBM) has been scaled up to 105 processors. Published papers indicate 30% efficiency with 9000 processors. It is not clear whether applications using all the 105 processors have ever been written and tested.
With nano-scale computing and quantum computing [19], we may confront a need to perform massively parallel computations at scales of 106 or more. Scalable parallel processing hardware networks appear in cellular automata [20][21], systolic machines [9], and asynchronous data-flow machines [7][8]. It is not, however, clear how these might be adopted to meet anticipated requirements of scalable software.
Thus, both concurrent and parallel software system design and implementation have now reached their limits of scalability and practical viability, and are caught in a computing bottleneck. Further scaling of these technologies to implement yet more complex, more reliable and more secure systems is no longer possible.
To make parallel programming address issues we face and solve problems we encounter, we need totally new ways of designing, implementing, validating, and maintaining parallel software systems. We need methods in which computer itself is used to help design, implement, validate and maintain such software systems. We need methods to arbitrarily scale our parallel software systems to meet ever increasing complexities through increased parallelism. Also we need self-monitoring capabilities that enable us to constantly monitor such software systems at run time, and issue timely reports on errors, pending errors and critical behaviors, and take appropriate predefined actions automatically in a timely fashion. We also need a framework in which self-diagnosis and self-repair become possible. Hardware technology to supply practically unlimited supply of hardware resources needed for such parallel software systems is here today in the form of multicore chips, and nano-technology based computing units are over the horizon. But, software technology to put available hardware resources to good use, is not here yet.
In the following we use acronym Ppde to refer to a viable Parallel program development and execution platform and postulate the set of requirements Ppde should satisfy in order to help us free ourselves from the computing bottleneck we are in today.
2. Desirable Properties of a Viable Ppde
There are compelling reasons to be open to scalable and efficient computer assisted methods which simplify parallel programming, updating and maintenance, formally verify them, and provide self-monitoring capabilities.
In the following, we use the phrase formal language to refer to a language which can be used to communicate with both computers and humans. Programming languages or languages of logic are examples of formal languages, while natural languages are not (at least, not yet!). We use the term ‘cell’ to refer to a parallel software execution unit. Each cell defines an independent parallel software process, run by a distinct ComPuting Unit (CPU) that is assigned to that cell. Cells have the capability to exchange messages among themselves, through communication pathways that interconnect them in a network of cells and pathways.
Cells are used in parallel software systems to process messages received from other cells, and to build and send messages to other cells. Each cell polls its input/output ports cyclically and either receives and processes messages delivered to the input ports and possibly respond to them, or builds service request messages and sends them to other cells through its output ports. These are the only operations performed by cells. This is similar to Actor systems [39,40]. There are several differences though between the proposed Ppde and Actor systems.
There are two kinds of cells: (i) Compound cell, which may be decomposed to a sub-network of cells and pathways that may be encapsulated with in the compound cell. (ii) Simple Cell, which cannot be further decomposed. We use the term ‘cell’ to refer to both compound and simple cells. A simple cell is abstract if computer programs used by the cell have not all been implemented (defined) yet. A compound cell is abstract if either its decomposition has not been specified yet, or the specified decomposition contains abstract cells. Defining computer programs used by an abstract simple cell is called cell refinement. Similarly, specifying decomposition of an abstract compound cell is called network-refinement. We now state the desirable properties of a viable Ppde as follows:                (i) Ppde should provide a formal language to specify requirements of any parallel software system for any parallel application Ap, that the design of Ap should satisfy. We use Ap:requirements( ) to refer to the set of assertions in the formal language that specify requirements.        (ii) We use Ap:design( ) to refer to the specification of an abstract design for Ap, consisting of cells and communication pathways together with abstract specifications of interactions among the cells in the network of cells and pathways. Ppde should provide a formal language to specify Ap:design( ). The formal language used to specify Ap:design( ) will be different from the formal language used in (i) above. Specification of Ap:design( ) should be free of characteristics that pertain to programming languages used to implement Ap:design( ). Ap:design( ) will not itself be an executable program, but should be such that Ppde could use Ap:design( ) specifications to automatically derive and prove properties of Ap:design( ) declared in Ap:requirements( ).        (iii) Ppde should provide interactive mechanisms, consisting of hardware, software and humans, which may be used to validate that Ap:design( ) satisfies Ap:requirements( ). We refer to this as formal verification, since verifications are done without having to execute (run) any of the specifications in Ap:design( ).                    There are several languages available today that can be used to perform tasks in items (i) through (iii) above: FSP (Finite State Process) languages [32,33], TL (Temporal Logic) languages [34,35], CTL (Causal Temporal Logic) languages [31,33,34,35,38], PN (Petri Net) languages [36], NND (Neural Net Design) languages [37]. It is not, however, possible to express an abstract design and interactively verify properties of the design using these languages for realistic practical parallel software systems. Also, these languages do not satisfy requirements in items (iv) through (vii) below.                        (iv) Ppde should provide interactive mechanisms to reduce Ap:design( ) to its implementation in a programming language through successive step by step and progressive refinements of abstract cells, pathways and their interaction specifications in Ap:design( ), preserving at each stage of refinement all validated properties in all previous refinements. We use Ap:implementation( ) to refer the validated implementation of Ap:design( ) at any stage of refinement.        (v) At each stage of refinement Ppde should provide facilities to modify Ap:design( ) and Ap:requirements( ) if necessary, update Ap:requirements( ) by adding new requirements, and formally validate all refinements performed up to that stage through formal verification of statements in Ap:requirements( ), without having to execute any of the defined computer programs in the refinements up to that stage. Ap:implementation( ) is fully refined only when Ap:design( ) does not contain any more abstract cells.        (vi) Ppde should provide facilities to formally validate the fully refined Ap:implementation( ) through formal verification of all statements in updated Ap:requirements( ), and provide criteria to ascertain that this verification does indeed fully validate the correctness of Ap:implementation( ). Implementation of Ap is complete and correct only when it satisfies all statements in the said criteria.        (vii) Ppde should provide facilities to incrementally update and modify Ap:implementation( ) at any stage of refinement, in order to correct errors and discrepancies encountered between Ap:implementation( ) and Ap:requirements( ), without having to start all over again with a new Ap:design( ) and a new Ap:implementation( ), every time.        (viii) Ppde should provide mechanisms to define data and system security features of Ap, which Ap:implementation( ) should satisfy. The mechanisms should be such that Ppde could automatically incorporate all specified data and system security features into Ap:implementation( ). Implementers should not be required to implement programs that enforce specified data and system security requirements, either after Ap:implementation( ) or during Ap:implementation( ).                    Most system design and development methodologies do not consider issues of protection, privacy and security as an integral part of system design. They are usually added on after a system is designed or implemented, often enforced by the operating system in which the system runs. Implementers are held responsible to implement the enforcement system. We require here, security and protection should be an intrinsic built-in part of Ppde execution mechanism, requiring only definition of security and protection parameters needed for Ap at the time of design, but not their enforcement mechanisms.                        (ix) Ppde should automatically provide a Self Monitoring System (SMS) to monitor performance of a completed Ap:implementation( ) at run time, throughout its life time, without interference with its timings and execution efficiency, in parallel with the running of Ap:implementation( ), in order to identify run time errors, pending errors and a priori defined critical behaviors, and promptly report them, or automatically take appropriate a priori defined remedial actions in a timely manner.        (x) SMS should have the infrastructure to eventually incorporate self-diagnosis, self-repair and learning abilities into Ap:implementation( ).                    Implementers should not be required to specify any component of SMS. Ppde should automatically derive SMS specifications from given implementation and security specifications and install SMS in the implemented system.                        (xi) Finally, the number of cells and pathways used in Ap:implementation( ) should be arbitrarily scalable, with minor modifications to the implementation without having to revalidate the scaled up version.        
If we get a Ppde platform that satisfies all of the above requirements, and it can be easily used to systematically design, develop, update, validate and run self-monitoring parallel software systems, then we can surely escape from the computing bottleneck we are in today. We will then be ready to take the next great leap from benefits of computer revolution we enjoy and suffer today, to land in an era of next great computer revolution, this leap being analogous to a leap from the era of steam engine days to the era of fuel cells.
TICC™ (Technology for Integrated Computation and Communication) TICC™-Ppde (TICC™ based Parallel Program Development and Execution) platform, which are the subjects of this patent application, together satisfy all of the above requirements, as described in Chapter 2. The term TICC™-paradigm is used to refer to abstractions, methods, rules and conventions used in TICC™ and TICC™-Ppde to design and build parallel programming applications, validate them and run them with self-monitoring. Hereafter we will use the terms Ap:design( ), Ap:implementation( ) and Ap:requirements( ) to refer to TICC™-Ppde designs, implementations and requirements of parallel software system applications Ap using the TICC™-paradigm. We briefly outline principal features of the TICC™-paradigm, pertinent to this patent application, in the next subsection. Details are given in Chapter 2.
3. Features of TICC™ and TICC™-Ppde
(i) Proof of concept prototypes of TICC™ and TICC™-Ppde have both been implemented and tested for parallel program development and execution to demonstrate that it is possible to build platforms that satisfy requirements listed in the previous subsection. Implementation and testing of the proof of concept prototypes of TICC™ and TICC™-Ppde were supported by NSF grants, DMI-0232073 and DMI-0349414 during the years 2003 through 2005. The prototypes are implemented in C++ and work in HP PROLIANT 760 Shared Memory Multiprocessor (SMM) in Linux OS-environment.
TICC™-Ppde provides an API (Application Programming Interface) and a Graphical User Interface (GUI), called TICC™-GUI, for design, development and documentation. GUI is used to develop and display the network of cells and pathways in Ap:design( ), update the network, display properties of components in the network and activate components in the network, when requested to do so The network is called TICC™-network. TICC™-GUI was designed and implemented by Mr. Rajesh Khumanthem, Mr. Kenson O'Donald and Mr. Manpreet Chahal, as per specifications provided by this inventor. TICC™ and TICC™-Ppde were both designed and implemented by this inventor with in a period of two man years. Specifications for TICC™ and TICC™-Ppde were also designed and developed by this inventor.
(ii) TICC™-Ppde prototype uses Operating System only for memory management, secondary memory access, input/output and internet access: It does not use the Operating System for interrupt control, scheduling, coordination, synchronization, monitoring, process and pthread (parallel thread) activations, and communications. Application programmers do not have to write programs for scheduling, coordination, synchronization, process activations and monitoring in parallel software systems they implement. Once abstract system design is completed and specified, TICC™-Ppde automatically becomes self-scheduling, self-coordinating, self-synchronizing, self-activating, self-monitoring and self-communicating.
Each cell executes its own communication protocols, in parallel with other cells, to exchange messages asynchronously with other cells, with guaranteed message delivery having only 350 to 500 nanoseconds latencies. The number of parallel simultaneous message exchanges occurring at any given time being limited only by the number of available cells. TICC™-Ppde does not use sequential buffers to implement asynchronous communications. TICC™-Ppde provides validated compiled codes for all protocols likely to be used in any parallel programming system. The prototypes run completely autonomously using the operating system only for dynamic memory management and input/output. All of these features have been tested in the prototype TICC™-Ppde. Eventually all operating system functions may be installed in TICC™-Ppde itself thereby providing an integrated environment for design, development, validation, and running of self-monitoring parallel software systems.
(iii) TICC™-Ppde automatically derives and builds a formal model of computations specified in Ap:implementation( ) at every stage of its refinement, including the Ap:design( ) stage. Models are expressed in terms of ALLowed Event Occurrence Patterns (ALLEOPs). Intended computations in parallel software systems specified by Ap:design( ) and Ap:implementation( ) causes a set of events to occur at run time, which in turn cause other events to occur as computations and communications continue further. ALLEOPs identifies classes of events that may occur in intended computations, specify causal relations between classes of events, and describes patterns of causal chains of event classes that may occur at run time. ALLEOPs thus specify event class partial ordering causal model of Ap:implementation( ). This event class ALLEOP model is referred to here as Ap:ALLEOPs( ).
TICC™-Ppde uses Ap:ALLEOPs( ) for two purposes: (i) to prove properties of Ap:implementation( ), such as correctness, mutual exclusion, progress, freedom from deadlocks/livelocks, and other properties that may be specific to given applications, at different stages of refinement of Ap:implementation( ) including the completed implementation, and (ii) to interactively derive the self-monitoring system, Ap:SMS( ), for given Ap:implementation( ) using Ap:ALLEOPs( ), and automatically install it as a part of mechanisms used to execute Ap:implementation( ), the installed Ap:SMS( ) having all the features described in items (ix) and (x) in the previous subsection.
Properties to be proven are stated as assertions in a Causal Temporal Logic (CTL) language, as explained in Chapter 2. The set of all such CTL-assertions constitute Ap:requirements( ) Designers and implementers have the responsibility to specify Ap:requirements( ) and update them at each stage of refinement. TICC™-Ppde automatically updates Ap:ALLEOPs( ) at each stage of refinement of Ap:implementation( ), and automatically derives Ap:traces( ) and Ap:ECT-networks( ) (Event Characterization Tables networks) from Ap:ALLEOPs( ). Ap:traces( ) and Ap:ECT-networks( ) are used to interactively validate assertions in Ap:requirements( ), as illustrated in Chapter 2, Section 7.
(iv) TICC™-Ppde automatically incorporates into its execution mechanisms facilities needed for enforcement of data and system security specifications provided by designers and implementers: The specifications set values for pre-defined attributes in TICC™-Ppde for data and system components, as explained in Section 8 of Chapter 2. The attributes together constitute a universal system, which can be used to specify any kind of data and system security features, specific to any application implemented in TICC™-Ppde.
(v) Finally, TICC™-Ppde defines a single criterion, which may be used to ascertain that the set of all CTL-assertions in Ap:requirements( ), when validated, would indeed establish correctness of Ap:implementation( ).
(vi) Comments: All of these and many additional features of TICC™-Ppde are described and illustrated in Chapter 2. Static verification techniques used by TICC™-Ppde have not been implemented and tested yet, but this inventor has defined the denotational semantics of Parallel Programming Languages (PPL) used in TICC™-Ppde, called TICC™-PPL, and defined a proof theory to validate proof methods used in TICC™-Ppde. These will be published in appropriate journals in the near future and do not constitute patentable materials. Methods used to interactively build refinements of Ap:implementation( ) and proofs of assertions in Ap:requirements( ) at any stage of refinement, are informally defined and illustrated with examples in Sections 3 and 7 of Chapter 2. It is not hard to implement the static verification techniques described and illustrated in Chapter 2.
Proofs of mutual exclusion, freedom from deadlocks/livelocks, and synchronization/coordination characteristics, given in Chapter 2 are properties of implementations, derived from implementations, and not just properties of intended system designs stated in an abstract non-executable language. These are the first computer assisted interactive formal proofs of their kind that pertain to actual executable programs.
We present in Chapter 2 organizational principles that explain why TICC™-paradigm has the right structure and operational characteristics to provide the means to address all of the requirements listed in the previous section and all the features described above. As explained in Chapter 2, even without pressing need to scale by factors of 106 or more, the new paradigm has several immediate benefits to offer. The paradigm is ideally well suited to build validated parallel software systems using multicore chips, validated real time systems, and secure systems with guaranteed security.
(vii) Nature of Inventions: The inventions here pertain to a collection of abstractions which facilitate the following: (i) specify abstract designs; (ii) provide methods to automatically derive ALLEOPs models from specifications of abstract designs and their refinements; (iii) provide guidance to perform successive and progressive refinements of designs and implementations, preserving at each stage of refinement properties validated in earlier stages; (iv) provide methods used to validate implementations at each stage of refinement using ALLEOPs; (v) modify TICC™ communication pathway structures and protocols in order to provide above mentioned characteristics; (vi) methods used to derive and install SMS for given design specifications; (vii) facilitate practically unlimited simultaneous parallel high-speed guaranteed asynchronous communications over TICC™-Ppde communication pathways, without having to use sequential buffers; (viii) enable program organization using virtualMemories; (ix) provide special facilities to ComPuting Units (CPUs), called TICC™-CPUs, to execute TICC™-Ppde programs without need to use an Operating System at any level of program execution; (x) enable precise prediction within given timing bounds, and control of execution times of program segments and protocols; (xi) enable automatic implementation of specified data and system security conditions; and (xii) leads to the specification of special hardware facilities in TICC™-CPUs, needed to execute TICC™-Ppde programs efficiently, validate implementations, enforce security and incorporate SMS.
An important invention that makes possible all of the above inventions, is the invention of Causal Communication Primitive (CCP) as a basic programming primitive [U.S. Pat. No. 7,210,145 B2], which can be used for hardware and software subsystems to dynamically communicate with each other by exchanging signals in order to coordinate and synchronize their activities, in a manner that is similar to how asynchronous hardware systems communicate with each other and coordinate their activities, CCP being implemented as a basic machine instruction and used to define communication protocols that enable guaranteed very high-speed parallel communications both in shared memory and distributed memory systems. As a machine instruction it will take only 5 nanoseconds (estimated) to execute a CCP in a 2 gigahertz CPU and it requires only about 8 CCP executions to deliver a message from one cell to another. Thus, with hardware assistance, communication latency may be drastically reduced to tens to a few hundreds of nanoseconds (estimated) or less. TICC™-Ppde using a variant of communications organization proposed in TICC™ [U.S. Pat. No. 7,210,145 B2]. Variation is quite small, but its consequences are profound.
Where as TICC™ communications were possible only in shared-memory software systems, TICC™-Ppde communications are possible both in shared memory and distributed memory parallel software systems. Distributed memory communications using CCPs over a local area TICCNET™. Also, TICC™ allowed only a limited number of parallel simultaneous communications. Message delivery latencies in TICC™ communications are not predictable. TICC™-Ppde allows unlimited guaranteed almost instantaneous and simultaneous parallel communications, with predictable message delivery times in nanoseconds, the number of such parallel communications that may occur at any given time being limited only by the number of available cells in a parallel software system.
(viii) No need to develop new technologies: As described in the concluding remarks of Chapter 2, the most important characteristic of inventions claimed here is that the inventions do not require any new technologies in order to be built and deployed. Inventions require only a new way or organizing computations and computing systems. Fully operational commercial version of TICC™-Ppde platform can be built, validated and deployed with in a period of 3 to 5 years, using only currently available technologies. The proof of concept prototype TICC™ and TICC™-Ppde, supported by NSF validate this claim. We outline in the concluding remarks of Chapter 2, short-term tasks that should be completed, in order to deploy TICC™-Ppde. TICC™-Ppde platform may then be used to build complex guaranteed high-speed high-efficiency validated secure parallel software systems in every area of human endeavor, some of which are outlined in the concluding remarks as long-term tasks.
(ix) Remarks: Both TICC™ and TICC™-Ppde are unique and are the first of their kind. There are no other integrated platforms of this kind in published literature or in patent literature with capabilities similar to TICC™ and TICC™-Ppde, except reference [22] below which pertains to TICC™, and the patent, U.S. Pat. No. 7,210,145 B2, for TICC™ issued to this inventor, for which international patent is still pending (patent application number, PCT/US2006/015305, published in PCT Gazette on Nov. 1, 2007, publication number WO 2007/123527). TICC™-Ppde uses a modified version of TICC™.
4. References for Chapter 1    1. M. Ehtesham Hayder, et al, “Three Parallel Programming Paradigms: Comparisons on an archetypal PDE computation”, Center for Research on High Performance Software, Rice University, Houston, Tex. 77005, USA, hayder@cs.rice.edu, Parallel Processing Research Group, University of Greenwich, London SE18 6 PF, UK, c.ierotheou@gre.ac.uk, Computer Science Department, Old Dominion University and ICASE, Norfolk, Va. 23529-0162, USA, keyes@icase.edu.    2. William Gropp, et al [1999] “Using MPI, Portable Parallel Programming with Message-Passing Interface, second edition”, The MIT Press, ISBN 0-262-57134-X. Also see http://www-unix.mcs.anl.gov/mpi    3. G. E. Karniadakis and R. M Kirby II, “Parallel Scientific Computing in C++ and MPI: A Seamless Approach to Parallel Algorithms and Their Implementation,” Cambridge University Press, 2003.    4. A. Geist, et al, “PVM: Parallel Virtual Machine A Users' Guide and Tutorial for Networked Parallel Computing”, MIT Press, 1994.    5. SHMEM: http://www.csar.cfs.ac.uk/user_information/tools/comms_shmem.shtml    6. OpenMP: http://www.llnl.gov/computing/tutorials/openMP/    7. P. C. Treleaven, D. R. Brownbridge, and R. P. Hopkins, “Data-Driven and Demon-Driven Computer Architecture”, ACM Computing Surveys, Vol. 14, No. 1, pp 5-143, march 1982.    8. W. D. Hillis and L. W. Tucker “The CM-5 Connection Machine: A Scalable Supercomputer”, Communication of ACM, Vol. 36, No. 11, pp 31-40, 1993.    9. T. Kung, “Why systolic architecture”, Computer, Vol. 15, pp 37-45, January 1982.    10. Gregory F. Pfister, “In Search of Clusters, The Coming Battle in Lowly Parallel Computing”, Prentice Hall PTR, Upper Saddle River, N.J., 1995., ISBN 0-13-437625-0.    11. Ian Foster and Carl Kesselman, “The Grid: Blueprint for a New Computing Infrastructure”, Morgan Kaufmann Publishers, Inc., San Francisco, Calif., 1999, ISBN 1-55860-475-8.    12. Jarkko Kari, “Theory of cellular automata: A survey”, Theoretical Computer Science 334 (2005): 3-33.    13. Donald L. Sollers, “Cache memory based instruction execution,” Issued May 23, 2000, Filed on Mar. 11, 1997, U.S. Pat. No. 963,389.    14. Gary T. Corcoran and Robert C. Fairfield, “Apparatus for controlling instruction execution in pipelined processor,” US patent issued Jul. 5, 1994, Filed on May 18, 1993, U.S. Pat. No. 4,873,630.    15. Barbara Bluestein Simmons and Vivek Sarkar, “System, method, and program product for instruction scheduling in the presence of hardware look-ahead accomplished by the rescheduling of idle slots,” patent issued Mar. 23, 1999, filed Jun. 18, 1996, U.S. Pat. No. 555,719.    16. Gary Tyson and Mathew Farrens, “Code Scheduling for Multiple Instruction Stream Architectures,” Computer Science Department, University of California, Davis, Davis, Calif. 95616 tel: (916) 752-678, fax: (916) 752-4767, email: tyson@cs.ucdavis.edu.    17. HECRTF, “Report of High-End Computing Revitalization Task Force”, May 10, 2004, http://cray.com/downloads/HECRTF-FINAL—051004.pdf.    18. D. J. Kuck, “High Performance Computing: Challenges for Future Systems”, New York, N.Y., Oxford University Press, 1996.    19. Sandeep K. Shukla, R. Iris Bahar (Eds.), “Nano, Quantum and Molecular Computing: Implications to High Level Design and Validation”, Kluwer Academic Publishers, Boston, Mass., 2004.    20. Evolving Cellular Automata, Research at Santa Fe Inst., http://www.santafe.edu/projects/evca/evca1/papers.htm#EvCA    21. Vipin Kumar, et al, “Introduction to Parallel Computing”, The Benjamin/Cummings Publishing Company, Inc., 1994, Chapter 10, pp 377-406, ISBN 0-8053-3170-0.    22. Chitoor V. Srinivasan, “Technology for Integrated Computation and Communication”, references section of http://www.edss-ticc.com. This corrects an error in the paper, presented at PDPTA '03 conference at Las Vegas on Jun. 26, 2003, pp 1910-1916    23. L. V. Kale and S. Krishnan (1996). “Charm++: Parallel Programming with Message-Driven Objects. In ‘Parallel Programming using C++’”, (Eds. Gregory V. Wilson and Paul Lu), pp 175-213, MIT Press.    24. L. V. Kale, Milind Bhandarkar, Narain Jagathesan, Sanjeev Krishnan and Joshua Yelonl (1996) “Converse: An Interoperable Framework for Parallel Programming”, Proceedings of the 10th International Parallel Processing Symposium, pp 212-217, April 1996.”    25. Hoare, C. A. R., [1978] “Communicating Sequential Processes,” CACM, vol. 21, No. 8, (August 1987), pp 666-677.    26. Milner, R., Parrow, J. and Walker D., A calculus of mobile processes, Parts I and II, Journal of Information and Computation, Vol 100, pp 1-40 and pp 41-77, 1992.    27. Robin Milner (1993), “Calculi for Interaction”, Cambridge University Tech. Report. 1995    28. Robin Milner Communication and Concurrency_Prentice Hall, 1989    29. Ole Hogh Jensen, Robin Milner (2004), “Bigraphs and mobile processes revisited”, University of Cambridge Computer Laboratory, Technical Report UCM-CL-TR 580, 15 JJ Thomson Avenue. Cambridge CB3 OFD, United Kingdom, phone +44 1223 763500, http://www.cl.cam.ac.uk.    30. Richard M. Karp and Vijaya L. Ramachandram “A survey of parallel algorithms for shared-memory machines”, Technical Report UCB-CSD-88-408, Computer Science Division, University of California, Berkeley, March 1988. To appear in Handbook of Z˜e. Theoretical Computer Science, North-Holland, Amsterdam, 1989.    31. Jeff Magee and Jeff Kramer, “Concurrency, State Models & Java Programming,” John Wiley & Sons, Ltd., 2006, ISBN-13 978-0-470-09335-9.    32. Robin Milner, “A Calculus for Communicating Systems”, Springer-Verlag, ISBN 0-387-10235-3, New York, Heidelberg, Berlin, 1980.    33. Clark, E. M., “Automatic Verification of Sequential Circuit Designs”: DBLP:conf/chdl/Clarke93, CHDL, 1993 IFIP conference proceedings, page 165.    34. Clark, E. M., Emerson, E. A., and Sistla, A. P., “Automatic Verification of Finite State Concurrent Systems Using Temporal Logic Specifications”, ACM Transactions on Programming Languages and Systems, 8, 2 (April), 626-643 (1986).    35. Clark, E. M. and Wing, J. M., et al, “Formal Methods, State of the Art and Future Directions”, ACM Computing Surveys, 28, 4, 607-625 (1996)    36. C. A. Petri and W. Reisig, (2008) Scholarpedia 3(4):6477, http://www.scholarpedia.org/article/Petri_net    37. Rinku Dewri, (2003), “Evolutionary Neural Networks: Design Methodologies,” http://ai-depot.com/articles/evolutionary-neural-networks-design-methodolgies/    38. Clark, E. M, Yuan Lu, Grumberg, 0, Veith, H., Jha, S., “Counter example guided abstraction refinement for symbolic model checking,” Journal of the ACM, Vol. 50, No. 5, September 2003, pp. 752-794.    39. Carl Hewitt, (1976) “Viewing Control Structures as Patterns of Passing Messages”, A.I. Memo 410, M.I.T, Artificial Intelligence Laboratory, 545 Technology Square, 02139.    40. Gul Agha, (1986) “ACTORS: A Model of Concurrent Computation in Distributed Systems”, The MIT Press Series in Artificial Intelligence, Dec. 17, 1986.