The present invention relates to the field of distributed and parallel computer software programming, and in particular to a software system including distributed agents which exhibits enhanced process mobility and communication and facilitates the construction of network-centric applications suited for both homogeneous and heterogeneous network environments.
In distributed computer systems, emphasis has traditionally been placed on issues concerning the partitioning and transmission of data among a collection of distinct computers or xe2x80x9cmachines.xe2x80x9d Typically, these systems allow code to be distributed and accessed in one of two ways. For example, in client-server systems, each machine holds code controlling the resources found on that machine. In others, the same code image is found on all machines. In either case, some form of message-passing is used to invoke operations on remote sites.
Traditionally, process mobility (i.e. moving executing processes from one machine to another) has not been an issue of significant importance. In client-server based systems, process mobility is essentially irrelevant; tasks are heavyweight (i.e. contain a large amount of state) and control resources resident on a particular machine. In systems where all machines share the same code image, process mobility may be used to help performance by improving locality and load-balancing. However, tasks typically execute heavyweight procedures, making task migration infeasible. Moreover, an efficient task migration policy that has simple, well-understood semantics has not been achieved to date.
Recently, process mobility is becoming increasingly important to the implementation of distributed computer systems. Enhanced process mobility allows computations to dynamically reconfigure themselves, taking advantage of improved data locality, and reducing the number of non-local communication events initiated. Several distributed system models have been developed to provide a certain measure of process mobility.
Imperative xe2x80x9cgluexe2x80x9d systems have been developed which generally operate as seamless extensions to an existing imperative programming language and add distribution and communication support to the existing language. Unfortunately, computation in imperative languages involves frequent modifications to shared global data, which is exactly what a distributed program needs to avoid. Two basic approaches have been developed to deal with this problem: distributed shared memory (or xe2x80x9cDSMxe2x80x9d) and remote procedure call (or xe2x80x9cRPCxe2x80x9d).
With DSM, while the distributed nature of the computation is largely invisible to the programmer, implementation complexity is greater than in a system which uses message-passing explicitly. All data is conceptually associated with a global address. Thus, the machine where a thread executes no longer influences the behavior of the program: dereferencing a global address may involve a remote communication to the machine xe2x80x9cowningxe2x80x9d the contents of that address. While DSM provides a mechanism to implement parallel dialects of imperative languages in a distributed environment, programmers have little control in specifying how coherence and consistency are realized. In particular, issues of process mobility become largely irrelevant since the distribution of data and tasks is implicitly handled by the implementation, and not explicitly managed by the program. While DSM simplifies programming, it is likely to be more effective when combined with mechanisms to explicitly control distribution and communication.
RPC provides a way of breaking a program into discrete parts, each of which runs in its own address space. Unlike DSM, RPC communication is explicit in the program, so programmers have complete control over costs. However, the semantics of RPC are substantially different from that of an ordinary procedure call. In particular, when a procedure P makes an RPC call to a procedure Q, the arguments to Q are marshaled and shipped to the machine where the computation should be performed. Stub generators on procedures linked to the application program are responsible for handling representation conversion and messaging. Arguments passed to a remote procedure are passed by copying. Thus, side effects to shared structures can no longer be used for communication between caller and callee. As a result, imperative programs must be substantially modified to run in a distributed environment using RPC. Consequently, programming a distributed agent system using RPC semantics is significantly more complex and subtle than sequential programming on a serial machine.
Process mobility, the ability to migrate a thread of control (or task) along with its associated state, is especially difficult. The imperative nature of these languages means that a large percentage of data found in programs must be global. Without using RPC, communication among processes must be via side effect, and not via allocation and copy. Thus, the advantages of having mobile processes is greatly mitigated. Conceptually, processes are highly mobile in these languages because they carry no state, but because they must frequently reference global (shared) data, process migration becomes useful only if the data they access moves along with the process requiring them. Given that global data is likely to be shared among several processes, the implicit coupling of data and code in imperative languages greatly weakens the utility of process mobility in these languages.
Recently, a shift to a new computational paradigm has occurred. Instead of regarding the locus of an executing program as a single address space physically resident on a single processor, or as a collection of independent programs distributed among a set of processors, the advent of concurrent, network-centric, object-based languages, such as Java, has offered a compelling alternative. See J. Gosling et al., The Java Language Specification, Sun Microsystems, Inc. (1995), which is expressly incorporated herein. By allowing concurrent threads of control to execute on top of a portable, distributed virtual machine, a network-aware language like Java presents a view of computation in which a single program can be seamlessly distributed among a collection of heterogeneous processors. Unlike distributed systems that require the same code to be resident on all machines prior to execution, code-mobile languages like Java allow new code to be transmitted and linked to an already executing process. This feature allows dynamic upload functionality in ways not possible in traditional distributed systems.
Java incorporates computational units known as xe2x80x9cobjects.xe2x80x9d An object includes a collection of data called instances variables, and a set of operations called methods which operate on the instance variables. Object state (i.e. the instance variables) is accessed and manipulated from outside an object through publicly visible methods. Because this objectoriented paradigm provides a natural form of encapsulation, it is generally well-suited for a distributed environment. Objects provide regulated access to shared resources and services. In contrast to distributed glue languages, distributed extensions of Java permit objects as well as base types to be communicated. Moreover, certain implementations, such as Java/RMI, also permit code to be dynamically linked into an address space on a remote site.
Since a primary goal of Java is to support code migration (note that code migration is conceptually distinct from process mobility, since code migration makes no assumption about the data to be operated by the instructions in the code being migrated) in a distributed environment, the language provides a socket mechanism through which processes on different machines in a distributed network may communicate. Sockets, however, are a low-level network communication abstraction. Applications using sockets must layer an application-level protocol on top of this network layer. The application-level protocol is responsible for encoding and decoding messages, performing type-checking and verification, and the like. This arrangement has been found to be error-prone and cumbersome. Moreover, Java only supports migration of whole programs. Threads of control cannot be transmitted among distinct machines. RPC provides one way of abstracting low-level details necessary to use sockets. RPC is a poor fit, however, to an object-oriented system. In Java, for example, communication takes place among objects, not procedures per se. Java/RMI, described for example in A. Wollrath et al., xe2x80x9cJava-Centric Distributed Computing,xe2x80x9d IEEE Micro, Vol. 2, No. 72, pp. 44-53 (May 1997), is a variant of RPC tailored for the object semantics defined by Java""s sequential core. Instead of using procedure call as the basis for separating local and remote computation, Java/RMI uses objects. A remote computation is initiated by invoking a procedure on a remote object. Clients access remote objects through surrogate objects found on their own machines. These objects are generated automatically by the compiler, and compile to code that handles marshalling of arguments and the like. Like any other Java object, remote objects are first-class, and may be passed as arguments to, or returned as results from, a procedure call.
Java/RMI supports a number of features not available in distributed extensions of imperative languages or distributed glue languages. Most important among them is the ability to transfer behavior to and from clients and servers. Consider a remote interface I that defines some abstraction. A server may implement this interface, providing a specific behavior. When a client first requests this object, it gets the code defining the implementation. In other words, as long as clients and servers agree on a policy, the particular mechanism used to implement this policy can be altered dynamically. Clients can send behavior to servers by packaging them as which can then be directly executed on the server. Again, if the procedure to be executed is not already found on the server, it is fetched from the client. Remote interfaces thus provide a powerful device to dynamically ship executable content with state among a distributed collection of machines. Java/RMI allows data as well as code to be communicated among machines in a Java ensemble. Such extensions permit Java programmers to view a computation not merely as a single monolithic unit moving from machine to machine (such as in the form of applets), but as a distributed entity, partitioned among a collection of machines. By using an architectureindependent virtual machine, information from one process can be sent to another without deep knowledge of the machines on which each process is executing or the underlying network infrastructure connecting these pieces together.
Java/RMI can be difficult to use, however. Remote objects are implicitly associated with global handles or uids, and thus are never copied across nodes. However, any argument which is not a remote object in a remote object procedure call is copied, in much the same way as in RPC. As a result, remote calls have different semantics from local calls even though they appear identical syntactically. The fact that Java is highly imperative means that distributed programs must be carefully crafted to avoid unexpected behavior due to unwanted copying of shared data.
In addition, neither Java nor Java/RMI permit an object to simultaneously span multiple heterogeneous machines. Each object is resident on exactly one machine at any given time. As a result, true concurrency on multiple machines within the encapsulation of a single object is impossible. Moreover, like a typical RPC system, communication among tasks using RMI is through copying. Thus, the semantics of a Java/RMI program may be quite different from a syntactically similar Java program.
Besides RPC and Java, numerous proposals have been made for agent languages which allow computation and data to freely migrate within a network. Conceptually, an agent is an encapsulation of a computation (i.e. a task) and related data that is mobile (i.e. can freely move about within a distributed network of machines).
For example, Aglets, described for example in D.B. Lange et al., xe2x80x9cProgramming Mobile Agents in Javaxe2x80x94With The Java Aglet APIxe2x80x9d, IBM, (1998), is a Java mobile agent system that uses the Java/RMI interface and a security manager to achieve a portable and secure agent system. However, Aglets do not permit an agent""s state to simultaneously span multiple heterogeneous machines, and migration of an agent requires the entire agent to move from one machine to another.
Telescript and Odyssey are two other mobile agent languages. See J. White, xe2x80x9cMobile Agents White Paper,xe2x80x9d General Magic, (1998); xe2x80x9cIntroduction to the Odyssey API,xe2x80x9d General Magic, (1998). While both Telescript and Odyssey agents can migrate during execution, the state of such agents can only reside on a single machine at any given moment. Thus, Telescript and Odyssey agents do not allow distributed state: when an agent moves, it is necessary that its entire state moves along with it. This limitation significantly reduces functionality and efficiency. In the case of Odyssey, only the state as found in the heap can movexe2x80x94the state of the stack, program counter, and registers are all lost. Telescript imposes similar restrictions.
Other systems that support mobile computation are Agent Tcl (see D. Kotz et al., xe2x80x9cAGENT TCL: Targeting the Needs of Mobile Computers,xe2x80x9d IEEE Internet Computing, Vol. 1, No. 4, pp. 58-67 (1997)) and ARA (see H. Peine et al., xe2x80x9cThe Architecture of the ARA Platform for Mobile Agents,xe2x80x9d Proceedings of the First International Workshop on Mobile Agents (K. Rothermel et al., eds.), pp. 50-61 (1997)), whose base languages were originally Tcl but who have recently been provided Java support. Like Odyssey and Aglets, these systems also prohibit an agent from having distributed state, and provide no infrastructure by which an agent can transparently access data resident on another machine.
Obliq (see L. Cardelli, xe2x80x9cA Language with Distributed Scope,xe2x80x9d Proceedings of the 22nd ACM Symposium on Principles of Programming Languages, pp. 286-298 (1995)) and Kali (see H. Cejtin et al., xe2x80x9cHigher-Order Distributed Objects,xe2x80x9d ACM Transactions on Programming Languages and Systems, Vol. 17, No. 5, pp. 704-739 (1995)) are two other programming languages that permit code and data to migrate within a heterogeneous network. Obliq""s sequential semantics is a delegation-based object system, whereas Kali is built on top of Scheme (see W. Clinger et al., eds. xe2x80x9cRevised Report on the Algorithmic Language Scheme,xe2x80x9d ACM Lisp Pointers, Vol. 4, No. 3, pp. 1-55 (July 1991)), a higher-order lexically-scoped dialect of Lisp. Neither of these two systems have an explicit notion of agents, however. While Obliq supports transparent references, it does so by severely restricting the conditions under which objects may migrate. Moreover, Obliq does not provide a notion of a distributed address space such as an agent. Kali requires all operations on remote references to be explicitly performed. Like Obliq, Kali does not support an object-based encapsulation model. These limitations make Obliq and Kali ill-suited for large-scale distributed systems with mobile applications.
Accordingly, there remains a need for a distributed computing system which is easy to program and which: (1) provides an object-based encapsulation model, such as an agent, which allows the processes and state of the agent to be distributed over multiple potentially heterogeneous machines; (2) enables transparent access of data resident on another machine; and (3) allows easy and efficient process migration, in whole or in part, among distinct machines.
Generally speaking, in accordance with the invention, a distributed software system for use with a plurality of computer machines connected as a network is provided. The system may comprise a plurality of bases, each base providing a local address space and computer resources on one of a plurality of computer machines. At least one agent comprising a protection domain is provided, wherein the protection domain of the at least one agent resides on at least one of the plurality of bases. A plurality of objects are contained within the protection domain of the at least one agent, a first object residing on a first base of the plurality of bases and a second object residing on a second base of the plurality of bases. The first object on the first base may access the second object on the second base without knowledge of the physical address of the second object on the second base. Finally, at least one runtime system is connected to the first base and the second base. The runtime system facilitates migration of agents and objects from at least the first base to at least the second base.
In another embodiment, the system may generally comprise at least one agent comprising a protection domain, wherein the protection domain of the at least one agent resides on at least two of the plurality of computer machines. A plurality of objects is contained within the protection domain of the at least one agent, a first object residing on a first of the at least two computer machines and a second object residing on a second of the at least two computer machines. The objects are selectively movable among the at least two computer machines by a programmer of the system. The first object on the first computer machine may access the second object on the second computer machine in a location-transparent or network-transparent manner; that is, without knowledge of the physical address of the second object on the second computer machine and regardless of the selective movement of either the first object or the second object among the first and second computer machines. The agent is mobile and may migrate, in whole or in part, to any other machine in the network. Moreover, the machines in the network may be either homogeneous or heterogeneous.
The invention further includes a method for implementing a network-centric computer software programming system for a network comprising a plurality of computer machines. The method includes defining a plurality of object-oriented classes including an object class, an agent class, a base class and a task class; defining an object migrate method in the object class that migrates a selected object instance to a location specified with the base class; defining a task migrate method in the task class that migrates a selected task represented in a task instance to a location specified with the base class; defining an agent migrate method in the agent class that migrates a selected agent process to a location specified with the base class, including migration of all object instances and task instances within the agent; instantiating a first agent process according to the agent class, the first agent process including a plurality of task instances and object instances and distributed among the plurality of computer machines; and performing the object migrate method, the task migrate method and the agent migrate method within the first agent process. Thus, the invention provides for partial or total migration of agents which are distributed among various machines of the network.
Each distributed agent of the present invention may accordingly be distributed among one, several or many of the machines of the network, enabling greater concurrency of operation while simultaneously maintaining a protected, encapsulated software structure which protects tasks and data within the agent (which themselves may be distributed among the machines of the network) from interference by other tasks and data operating in the network and on the same machines wherein such tasks and data reside, in particular. Migration of such agents, even during process execution, is straightforward and maintains consistency across the network. Specifically, other agents may continue to access a particular agent after it has migrated without any prior notification to the agents themselves.
Accordingly, a principal object of the present invention is to provide a distributed agent system wherein an agent may have its tasks and state distributed among multiple potentially heterogeneous physical machines within a network.
Another object of the present invention is to provide a distributed agent system which is network-transparent, wherein references to objects within an agent, including objects residing on distinct physical machines, do not require knowledge of the physical location or address of the object and may instead be made using symbolic references.
Yet another object of the present invention is to provide a distributed agent system in which references to objects within an agent are resolved by the system transparent to the programmer and to the agent.
A further object of the present invention is to provide a distributed agent system which provides selectable, location-independent method execution.
A still further object of the present invention is to provide a distributed agent system which allows easy and efficient runtime process migration, in whole or in part, among distinct machines.
A still further object of the present invention is to provide a distributed agent system which is easy to program.
Other objects of the present invention will become more readily apparent in light of the following description in conjunction with the accompanying drawings.