The invention is a novel and unique computer architecture that recognizes hardware and software components as individual entities that are dynamically erected to create computer systems and sub-systems that meet functional and performance requirements as defined by executing processes. This invention replaces traditional computer designs with a unique, asynchronous decentralized process that does things existing computer architectures cannot.
The architecture is termed “Meta Mentor” (also referred to herein as “MM”) and is a fault-tolerant, distributed processing architecture. Meta Mentor processors create systems and sub-systems by means of a fault tolerant mentor switches that route signals to and from hardware and software entities. The systems and sub-systems created are distinct sub-architectures and unique configurations that may be operated as separately or concurrently as defined by the executing processes.
According to this invention any unexpected or unplanned change in hardware or software operation, intentional or random, is defined as a fault. Intentional faults are considered to be a category of faults that include planned subversion of a system or component. These types of faults are mitigated by two approaches, rule and role based definitions of individual components and processes, and secondly, using algorithms that analyze component and process characteristics. According to this invention, fault mitigation is accomplished by dynamically reconfiguring a functioning system through process assignment and system virtualization.
The MM architecture operates beyond the conventional one domain of reference; MM systems operate in both the data and instruction domain of conventional systems, but also in the meta mentor domain which is independent and distinct from the data and instruction domain, as further explained below. It should be noted that throughout this document, the data and instruction domain is referred-to alternatively as “data and instruction domain” or “DID”.
An MM system can be dynamically reconfigured to achieve quorum-based computing, Euler Square paradigms, asynchronous fault tolerance that requires available components (as opposed to redundant components). It is capable of running different operating systems simultaneously, and is not bound to binary-compatible CPUs, (although the applications must be binary compatible with the CPU). The system can also monitor the integrity of application software, prevent unauthorized access from personnel, or programs, and record malware activity.
The unique design of the MM architecture recovers from faults by saving system-state information in a classical checkpoint action. The system can be check-pointed during an interrupt, system-context switch, or forced timeout. In the event of hardware failure, the system or application is automatically restarted at the last checkpoint, and reassigns functioning processes to available hardware. Errant applications, random or intentional, are “blacklisted”, and all associated data, storage usage, and state information can be saved for forensic analysis. After the errant application is bypassed and operator notification sent, the system continues to function as if the application was never started. In addition to withstanding hardware or software malfunction and preventing computer virus infections, the exclusive architecture also functions seamlessly with existing operating systems and software.
To explain the Meta Mentor architecture, it is useful to review conventional computer architectures which can generally be divided into two categories, von Neumann (VN) architecture and non-von Neumann architecture. The vast majority of production computer systems incorporate the VN architecture. Referring to FIG. 1, illustrated is a conventional VN architecture consisting of four components: (a) processing unit that has an arithmetic logic unit (ALU) and control registers 2; (b) a control unit that is comprised of an instruction register and program counter 3 (the ALU and the control unit are collectively called a central processing unit and commonly referred as a “CPU” 4); (c) a common memory 5 to store both data and instructions (the common memory illustrated as six “Mem” units); and (d) an input 6a and output 6b mechanism. According to some in the art the phrase von Neumann architecture has evolved to mean a stored-program computer where instructions and data are stored in the common memory, and instructions and data cannot be accessed simultaneously due to the physical limitations of accessing two pieces of data through one data bus. This limitation is called the von Neumann bottleneck and is commonly referenced as the limitation of the architecture.
FIG. 2 shows the VN architecture represented in a Venn diagram. The CPU 4 has an intersection with the System 5. The System in this diagram consists of the memory and the input/output devices. At startup, some event interrupts the CPU and defines an address for a process to be started, which defines the address space of the individual components. In this sense, it's circular logic. The CPU is directly affixed to the system, and a centralized timing signal coordinates all component functions with the architecture. The address space of the CPU defines a domain, which is the entire space the CPU can have direct control. The intersection of the system and CPU represent the circuitry connecting the two entities.
FIG. 3 is a Venn diagram showing the VN architecture in a virtualized configuration. This configuration is seen in several applications today, such as Zones, Zen, Vmware, and others. Virtualization within VN architecture is a software aberration and when the software stops functions, the system reverts to FIG. 2. It is also a single domain system, with the interconnecting circuitry defined by the intersection of the CPU and System.
Referring to FIG. 4, illustrated is another type of computer architecture, the Harvard (H) architecture. The H architecture is differentiated from the VN architecture by separation of the instruction and data information. This means the two types of information are physically separated, and often have a different format. The input/output to a Harvard CPU 4 is basically the same as utilized in the VN architecture. A hallmark of the H architecture is that the instruction fetches and data access does not contend for a single physical path to memory. Implicit in this design are two different and separate types of memory, data memory 7 and instruction memory 8, and two data paths 9a and 9b, one for each type of memory. The H architecture is considered faster than the VN architecture because instruction and data can be acquired roughly twice as fast.
FIG. 5 shows the H architecture in a Venn diagram. The intersection between the Instruction System 10 and CPU 4 can be seen, which represents the circuitry connecting the two. Likewise, the intersection between the Data System 11 and CPU denotes the connecting circuitry. As seen in the VN architecture, a centralized timing signal (or signals) sent from the CPU coordinates all components within the system. Also, like the VN architecture, the H architecture address space defines a single domain, the data and instruction domain (DID) more fully explained below.
The Harvard architecture can be designed such that data, or data memory, cannot modify instruction memory if the memory is read only. However if anyone wanted to modify the instructions, the new instructions would be loaded from the data memory. An application for example, would transfer its execution code from data memory to instruction memory based upon the CPU's, ALU's transfer instructions. This is important when considering the effect of malware that might modify the instruction code causing the CPU of a system to perform unintended tasks.
The H architecture is often characterized as being “memory bound” because the Instruction set execution speed in the ALU is far greater than the memory access time. Thus the speed in which the ALU can process data is limited by the access time to the instruction memory and the execution speed is bound by the memory access time. This problem is often solved by providing the ALU a small amount fast memory, called a cache. In practice there are three levels of cache normally referred to as L1, L2, and L3.
Another architecture combines both H and VN architectures to create a functioning CPU. Internal to the CPU, common VN information is divided into data and instruction H categories. This is usually accomplished in a temporary storage area called a cache. The H architecture handles internal CPU data/instructions and the VN handles external CPU data/instruction transfers. Instruction information can be divided based on an algorithm. A simple algorithm example is to assign address range 0-500 as instruction and 501-1000 as data. This rule is determined by the electronic design, and can never be changed after manufacture. A control unit is connected to address space 0-500 and the ALU to address space 501-1000. This defines an H architecture where the address space of information is divided into either logic or data information as shown in FIG. 4. This address space between the control unit and the ALU is a one-to-one relationship in a VN or H architecture, as shown by in FIGS. 1 and 4.
Also shown in FIGS. 1 and 4 are the external input 6a and output 6b modules. These modules receive or process information by electronic signals external to CPU. This relationship is determined at electronic design time and cannot be changed after manufacture. The input/output, information or data areas, and control are determined by a fixed electronic pathway. How information is classified as instruction or data is determine by software algorithms that function in the control unit and ALU. The control unit requesting information from the data memory can change the information to instruction information.
How the algorithm is executed can be accomplished by the electronics or the algorithm. In the algorithm case, what is data and information is decided by the algorithm, which is circular logic. The electronics defines the hardware address space by how the pins are connected. Together they define a domain, a DID. The address space is all information, data or instruction, the algorithm or electronics can control by an algorithm within the address space. Furthermore, when information is exchanged between components they adhere to periodic signals common to all devices that define the state of the communication. This is usually provided by the CPU to external devices within its domain and this signal is timed with internal devices. This defines a synchronous system, where a common periodic signal, such as a clock, governs components states during information transfer.
Functionally, a CPU has other components. A temporary information storage area, called a cache, is typically where data and instruction information is categorized. An example is shown in FIG. 7. The L2 Cache 12 receives information from the Common L3 cache 13 and associated elements. The L3 cache is VN architecture and the L2 cache is H architecture. These internal memory components create high speed memory transfers within a predefined network of electronic paths located within CPU.
Referring again to FIG. 7, externally the CPU uses von Neumann's architecture. Using this architecture the CPU emanates, receives, and processes (information) signals with devices that are external to the CPU. These devices are usually slower than the internal devices. The flexibility and practical considerations make VN the architecture of choice external to the CPU. External devices electrically transfer information to instruct the CPU how to function. The CPU operates on external memory information, both data and instruction information, and operates as a von Neumann machine. In practice data is transferred between the VN and H architectures in bulk, through a process called direct memory access (DMA). The DMA process bypasses the CPU and transfers information faster than the normal CPU read/write process.
Another computer architecture is called tagged token or dataflow. The VN and H architectures can be categorized as control flow architectures and feature a CPU, memory, and their transfer within. Dataflow systems do not theoretically have a program counter and execute processes by the availability of the input data to an instruction set. It is billed as an artificial intelligence engine, a data-driven parallel computer professed to have a simple architecture that acts as a kind of neuron. This system has several registers or processes that execute based upon signals that the data has arrived. Dataflow architecture can be evaluated/contrasted in terms of control flow architecture.
Referring to FIG. 6A, illustrated is a comparison of a control flow and a dataflow program. The Equations define the problems to be solved. The left hand shows a conventional control flow architecture and the right side the dataflow architecture. The solid arrows and lines point to the locations and flow of the data. In a memory program each process stores the resultant in a common memory location that is accessed by the next process in a session of processes. The dashed arrow lines show the instruction progression. The Tagged Token model places data into the process memory directly. When a process is filled with the required input tokens, the process is fired and its output is pointed to defined processes input or placed in memory outside the Tagged Token processing element.
In practice, the Dataflow architecture has a several processing elements which can communicate with each other. One such processing element is described in FIG. 6B and consists of a matching unit 21, fetching unit 22, functional unit 23, and associated memory, 24a and 24b. It can be noted that in simple dataflow machines the matching and fetching units are combined into a single processing element called an enabling unit that has a single memory. However, when input tokens become numerous or grow in size, the enabling unit is configured with two memory types, each attached to its processing element.
The matching unit stores tokens (data) in its memory and checks the availability of the destination node shown as the Fetching unit. This requires information on the destination address and the process tag. When all of the tokens for a process instance are assembled they are sent to the fetching unit, which combines them with a node description (This is analogous to instructional memory). The node description is stored in the memory for nodes and forwards this information onto the functional unit where the process is executed.
Inside the CPU
The VN and H architectures can be seen in CPU designs. For example, FIG. 7 shows a conventional microprocessor architecture that is representative of a typical CPU. Depicted in FIG. 7, an L2 cache 12 is synchronously connected to an Unicore module, consisting of an L3 cache 13, memory controller 14, Quick path Enter-Connect (Inter-Connect) 15, and a Quadruple Associative Instruction Cache 16. It can be seen the instruction and data memory use the L2 cache. The L2 cache feeds one data path 17 that is connected to the instruction cache that ultimately feeds the Decoded instruction queue 18; the other path feeds the Data Cache 19. The L2 cache is a von Neumann memory with Harvard categories and is connected with other von Neumann style memories via the L3 cache in the Unicore module. Although it can be argued there is some virtualization of the various caches, these are passive devices where timing signals and address space are sent to and from a predetermined device and information is always located in a predefined device whose timing signals and data path cannot be altered, thus defines a single domain, a DID, with a single purpose dedicated to one device. In this system having a single domain, data and instructions can be interchanged creating computational corruption hazards.
The Translation Lookaside Buffer (TLB) 20, shown in FIG. 7 as “L2-TLB”, references the physical (primary) von Neumann memory. A TLB has a fixed number of slots that contain page table entries, which maps virtual addresses to physical addresses. This space is segmented in pages of a prefixed size; one page for each table entry. The TLB has two modes, physical address and virtual address, depending where the TLB is located. When the TLB is located between the CPU cache and primary memory storage the cache is physically addressed. In physical addressing, the TLB performs a lookup on every memory operation and the information located in the primary memory is copied to the cache.
When the TLB is located between the CPU and cache, the CPU TLB is only referenced when a cache miss occurs. FIG. 7 shows the L2-TLB connected to a CPU consisting of the Quadruple Associative Instruction Cache 16 and Octruple Associative Data Cache 19. The L2-TLB is also connected to the L3-Cache and Memory controller which is a type of hybrid physical/virtual location.
A Harvard architecture CPU takes a different approach. In a pure Harvard architecture, as described above, instruction and data memory are physically separated and can be seen in the two data paths or data busses external to the Harvard architecture CPU; one leading to the instruction memory and the other is connected to the data memory, which leads to two distinct cache types, one for the instruction memory and the other for data memory.
Although there are several commercial examples of a pure Harvard architecture, in practice most Harvard architecture use modified designs of the Harvard architecture and are sometimes referred to as ‘modified Harvard architectures’. In this modification a common memory device is used and the separation between data and instruction memory is divided by memory address space and not physically separation. This allows for the Harvard's architecture improved performance with a type of common memory. However, current CPUs typically separate the data and instruction memory within the CPU cache. This can be seen in FIG. 7, where the information signals travel from the L2 cache instruction memory address space to the Quadruple Associative instruction cache. Also shown in FIG. 7 is the L2 cache data memory address space providing information to the Quadruple associative data cache via a permanent connection labeled “256”. Once the information is separated in the data cache and the instruction cache the remaining processes act like Harvard architecture. Also shown in FIG. 7 is the permanent connection between the L2 and L3 caches, which provides a pathway from the CPU's internal components to the CPU external components. It is notable that the information pathway to and from the L2 are dedicated and timed to devices located within the CPU. This constitutes a single process domain, a DID.
Multiple Core Processors (Homogeneous)
Referring to FIGS. 8A and 8B, two or more Central Processing Units (CPUs) can be packaged in a single chip and two examples are shown. In this configuration each processing unit is independent of one another but share internal resources and a common external interface. In packages that have multiple internal CPUs, each CPU is referred as a core. Cores share resources as described above. A package can contain many cores. FIGS. 8A and 8B show two types of dual core (CPU) processors. FIG. 8A is a dual core processor with respective L1 caches 25 and a shared L2 cache 26. Each core is directly connected to the L2 cache and each I/O Control manages the instruction and data memory in the shared cache. As illustrated in FIG. 8B, each core has an individual L2 cache 27 that is not shared. In each case, the external interface is configured by the memory controllers 28 or bus interface 29 and is defined by the core address space.
Referring to FIG. 9, shown is a sixty-four core CPU. FIG. 9 is a representative of a multiple core CPU as there are no theoretical limits to the number of CPU's, although there are practical limits that involve manufacturing the package. The devices on the left side of FIG. 9 depict how this configuration transfers information to and from input and output apparatuses external to the CPU. Also shown on top and bottom are four memory controllers 30 that reference the physical (primary) von Neumann memory. The cores 31 transfer among themselves through a 2-D mesh network. FIG. 9 has five paths in its network node, each path dedicated to transferring one type of information. The information is segregated to increase transfer speed within the internal network. Attached to each core is a mesh switch 32 (FIG. 9, detail A) whose purpose is to transfer information to and from each core and separates the information into the five data paths. Internal to each CPU core, are the L1 and L2 caches as shown in FIG. 9, detail B. In this particular representation the L1 cache has Harvard architecture and the L2 cache von Neumann architecture. The L1 Instruction and Data TLB determine their respective memory space within the L2 Cache. A unique feature of this device is its ability to share L2 cache between cores.
Multi-Core Processors (Heterogeneous)
Some other configurations of multiple cores in a single CPU package are called graphics processing units that employ a type of processing unit called a GPU in addition to a CPU. GPUs are devices designed to do calculations related to 3D computer graphic cards. Because 3D graphic computations involve complex mathematical operations, GPUs are designed to perform mathematical operations with high speeds and efficiency. GPUs can be employed to perform non-graphical calculations. When a GPU design is used for this purpose, the device is referred as a GPGPU (general purpose GPU). The model for GPGPU computing is to use a CPU and GPU together in a hierarchical, heterogeneous computing model. The sequential part of the application runs on the CPU and the computationally-intensive part is accelerated by the GPU. In this heterogeneous model the CPU and GPCPU memory operates in conjunction with its memory in von Neumann, Harvard architecture, or modified Harvard architecture.
In practice, the application (code) is started on the host CPU, which is tasked to distribute the compute-intensive portions of the application to a GPU. The rest of the application remains on the host CPU. Mapping a compute intensive function to a GPU involves a process called parallelizing. Manufactures of GPGPU devices provide special compute commands that facilitate parallelization processes that move data and instruction information to and from a GPU. The individual who develops the application is tasked with launching 10s of 1000s of instruction snippets called threads simultaneously to the GPUs using these compute commands. The GPU hardware manages the threads and does thread scheduling.
Referring to FIG. 10, a conventional GPGPU and GPU architecture is illustrated. One function of this design is to display pixel values on a video display from an input algorithm. The host 33 sends an algorithm call geometric primitive to the GPGPU input assembler 34, which is usually a CPU that schedules a series of GPUs to calculate a pixel value on a computer screen. The thread processor assigns various thread issue(s) to calculate the pixel value. The thread processor function can become quite complicated as it is transformed from primitive to pixel, using various shading and rendering hardware and software, but the bottom line is the GPU deposits a pixel value information into a frame buffer (FB) memory 35. The FB memory is displayed on the computer screen and the image is displayed to a user.
Referring again to FIG. 10, a process from the host is transformed into pixel values deposited to the FB memory. A process sent from a host is routed to the input assembler GPGPU. The input GPGPU in conjunction with the thread schedule segregates the host process, by function, and assigns the segregated sub-processes to the thread issue hardware and the texture processing cluster (TPC) for processing. This action, in effect, breaks the host process into segments that are processed separately by the individual TPCs. The TPC output results in ordered pixel value blocks associated with the segregated sub-process that are routed to the associated frame buffer memory by the thread processor. In this fashion the pixel value blocks are aligned with a video display in a human readable format. The raster operations processor (ROP) puts the finishing touches on the frame buffer pixel values. From a memory perspective, instruction memory in this device is determined at the device's design and resides internal to the Thread Issue, TPC, and ROP hardware. Data is passed from the Host and deposited into FB memory. In this configuration it is a modified Harvard architecture where data and instruction memory are separated.
The type of heterogeneous system shown in FIG. 10 can be applied to solve general computing problems. The basic process for CPU/GPGPU application computing is the same as the graphics card, with the addition of a bi-directional bus to the CPU and memory.
FIG. 11 illustrates a TPC module and the basic process flow of a GPGPU. Two types of processing units are shown, sp 36 and tp 37. The stream processing units (sp) and special function unit (SPU) 38 process the math. The texture processing units (tp) process texturing. All sp units perform the same functions for a single cycle, acting as a single processor processing multiple values. The same can be said for the tp units. The interconnection network 39 routes data information requests from the host and input assembler via the thread processor. The TPC process engines generate their address space and the interconnect network performs a one-to-one mapping between TPC address space and individual ROP and shared memory. Together the ROP and shared memory (L2) are referred to as shared parallel memory. When used as a generalized computer, the frame buffer 40 is referred as the global memory and communicates by transferring sequential grids via permanently connected shared memory. The memory configuration remains the modified Harvard architecture, where data from the host is handled by input assembler and passed on to the TPC for further processing in the same manner described above. Using the GPU device to solve general computing problems, data information is passed to the host for storage or further processing.
Grid Computing:
Grid computing is a term referring to the combination of computer resources from multiple administrative domains, each domain being conventionally a DID, to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of independent system DIDs. It is a sharing of computing resources in a related manner, similar to a gas or electric grid sharing resources to deliver a service to a customer.
For example an electric consumer turns on a light and consumes electricity from sources that seem to appear as needed and you pay for what you use. The light bulb does not require the use of the total capacity of the system, and the grid resources are shared by others who are attached to the same grid. In the case of grid computing, it is useful to divide the resources into the “front end” and the “back end”. The front end is analogous to the consumer whose light is using the grid resources, and the back end is analogous to the dams, power plants, transmission lines, and control systems supplying the resource to the light. In computer terms, the back end grid computing refers to the various computing systems that include computing/computational resources, storage resources, and control systems that route the information from resource to resource. The front end is the resource user, a PC or MAC, is requesting a back end service to accomplish a task, algorithm, or other request.
Each device on a computing grid is represented by an individual computing resource domain, conventionally a DID. There are many of these resource domains and when a front end user executes an application using grid computing; a combination of computer resources from multiple system DIDs combine to reach a common goal and form a single process domain. This combination may or may not happen simultaneously. The process domain convergence is accomplished by the front end user requesting grid resources and back end creating the request. The resource sharing is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem solving and resource-brokering strategies emerging by grid users. A set of individuals and/or institutions defined by such sharing rules form what is called a virtual organization using a single process domain, a DID.
Cloud Computing
Referring to FIG. 12, illustrated are several variations of grid computers that are often referred as cloud computers. These computers are available including, but not inclusive, cloud, volunteer, network, utility, and more. There are private and public clouds, hybrid clouds, community clouds and more. One problem with grid computing is security. Concerns about loss of control over sensitive data, such a passwords, email, or file information, and the lack of security for remotely stored kernels owned by anonymous operators that are beyond the administrative control of front end user is a problem for grid computing.
Cluster Computing
Cluster computing is defined as linking two or more computers usually via local area networks that work together as a unit, sharing computing resources such as common disk space, power, and networking. It is similar to grid computer, but with cluster computing all resources within the cluster are dedicated, closely coupled, and locally connected. There are two basic memory models in cluster computing, parallel and distributed models shown in FIGS. 13 and 14.
Referring to FIG. 13, the parallel memory model has memory permanently attached to a processor. Information needed to perform a common task is passed through processor communication. This type of cluster share features two or more CPUs 41 sharing a common memory 42. It is also called a Common Memory Architecture or Shared Memory Architecture. This arrangement is similar to the multi-core CPU or GPGPU described above. This arrangement involves connecting the buses of several processors together such that all memory is available for all processors on the shared bus or only inter-processor communications share a common bus. Shared memory systems usually operate with a single operating system either with a master processor partitioning the shared memory and several slaves operating in their respective partitions; or with all processors running separate copies of the operating systems with a common arbitrated memory and process table.
Distributed cluster memory configuration is illustrated in FIG. 14. Several computer entities 43, each running their own operating system are networked together. Parallelism is accomplished by parent nodes passing child processes to another networked entity, where the completed process passes the result back to the parent entity.
Both distributed and parallel cluster systems share common resources that are physically connected. These shared resources extend beyond the memory models in FIGS. 13 and 14 to shared peripherals such as disk space, networking resources, power, and more. The resources are networked together to form the cluster. The physical implementation is accomplished one or more types of network topologies and protocols. Topologies include: point-to-point, bus, star, ring, mesh, tree, hybrid, daisy chain. Some of the variations on each type listed are centralized, decentralized, local, wide, logical, and physical. When applied to cluster technology each system utilizes a network configuration to send signals between systems and to common resources.
One such network is called a switched fabric. FIG. 15 illustrates a switched fabric network in a mesh topology and configured as a synchronous fault tolerant system. For example, Host′ 44 requests services from a Resource2 and the transaction is carried through either switch 45 or switch 46, depending on the how Host′ requests the resource. Should one switch fail, the all traffic is routed through the remaining switch. While both switches are functioning properly the system is fault tolerant, however after one switch fails, the system is not fault tolerant until the failed switch is repaired.
FIG. 16 illustrates another type of network called a Clos network. A Clos network is a multi-stage circuit switching network that has seen application in cluster networking. This network features middle stage crossbar switches 49 that are able to dynamically route signals between hosts 47 and resources 48. For example, Host′ requests services from a Resource2 and the transaction is carried through either Switch1 or Switch2, depending on the how Host′ requests the resource. Should one cross bar switch fail, the all traffic is routed through the remaining cross bar switch. When all cross bar switches are functioning properly the system is fault tolerant, however after one switch fails, the system is not fault tolerant until the failed switch is repaired.
In addition to the conventional architectures described above, there are more, too numerous to list. They all have one thing in common; all processes and sub-processes are initiated by a processor unit that operates within one DID defined by the processor's frame of reference whose extent is the address space of the processor. In some cases, a parent process can pass a child process to another DID for sub-processing. The child process's DID is another independent domain whose extent is defined by the address space of the processor unit running the child process. Even in shared memory computer architectures, the frame of reference is defined by the processor's address space that operates only on data and instructions within that address space and this constitutes one domain, the data and instruction domain (DID).
The MM architecture has the ability to dynamically define and re-define the data and instruction information address space. This eliminates the distinction between the von Neumann and Harvard architecture because both of these architectures define their address space in terms of the processing unit. The MM architecture logically defines the address space; making the physical location of the memory irrelevant and beyond the scope of a central processing unit. Furthermore, in the von Neumann and Harvard architectures, the processing unit, or a process emanating from the processing unit, schedules when and how their processes are run. The MM architecture uses one or more MM processors to schedule processes on slave processor units (SPU) operating in the DID. All requests for processes are handled and scheduled by the MM algorithms. Furthermore, in all architectures information is exchanged between components along predetermined paths that adhere to periodic signals common to all devices and define the state of the communication. In the MM architecture these paths are determined by the MM processor, and the periodic signals are common to MM devices, rendering the information exchange among devices asynchronously.