Fueled by the growing importance of the Internet, interest in the area of distributed computing environments (two or more computers connected by a communications medium) has increased in recent years. Programmers desiring to take advantage of distributed computing environments modify existing application programs to perform on distributed computing environments, or design applications for placement on distributed computing environments.
A distributed application is an application containing interconnected application units (“units”) that are placed on more than one computer in a distributed computing environment. By placing units on more than one computer in a distributed computing environment, a distributed application can exploit the capabilities of the distributed computing environment to share information and resources, and to increase application reliability and system extensibility. Further, a distributed application can efficiently utilize the varying resources of the computers in a distributed computing environment.
Various types of modular software, including software designed in an object-oriented framework, can conceivably be distributed throughout a distributed computing environment. Object-oriented programming models, such as the Microsoft Component Object Model (“COM”), define a standard structure of software objects that can be interconnected and collectively assembled into an application (which, being assembled from component objects, is herein referred to as a “component application”). The objects are hosted in an execution environment created by system services, such as the object execution environments provided by COM. This system exposes services for use by component application objects in the form of application programming interfaces (“APIs”), system-provided objects and system-defined object interfaces. Distributed object systems such as Microsoft Corporation's Distributed Component Object Model (DCOM) and the Object Management Group's Common Object Request Broker Architecture (CORBA) provide system services that support execution of distributed applications.
In accordance with object-oriented programming principles, the component application is a collection of object classes which each model real world or abstract items by combining data to represent the item's properties with functions to represent the item's functionality. More specifically, an object is an instance of a programmer-defined type referred to as a class, which exhibits the characteristics of data encapsulation, polymorphism and inheritance. Data encapsulation refers to the combining of data (also referred to as properties of an object) with methods that operate on the data (also referred to as member functions of an object) into a unitary software component (i.e., the object), such that the object hides its internal composition, structure and operation and exposes its functionality to client programs that utilize the object only through one or more interfaces. An interface of the object is a group of semantically related member functions of the object. In other words, the client programs do not access the object's data directly, but instead call functions on the object's interfaces to operate on the data. Polymorphism refers to the ability to view (i.e., interact with) two similar objects through a common interface, thereby eliminating the need to differentiate between two objects. Inheritance refers to the derivation of different classes of objects from a base class, where the derived classes inherit the properties and characteristics of the base class.
An application containing easily identifiable and separable units is more easily distributed throughout a distributed system. One way to identify separable units is to describe such units with structural metadata about the units. Metadata is data that describes other data. In this context, structural metadata is data describing the structure of application units. Further, application units are desirably location-transparent for in-process, cross-process, and cross-computer communications. In other words, it is desirable for communications between application units to abstract away location of application units. This flexibly enables the distribution of application units.
The partitioning and distribution of applications are problematic and complicated by many factors.
To partition an application for distribution, a programmer typically determines a plan for distributing units of the application based on past experience, intuition, or data gathered from a prototype application. The application's design is then tailored to the selected distribution plan. Even if the programmer selects a distribution plan that is optimal for a particular computer network, the present-day distribution plan might be rendered obsolete by changes in network topology. Moreover, assumptions used in choosing the distribution plan might later prove to be incorrect, resulting in an application poorly matched to its intended environment.
Generally, to distribute an application, one can work externally or internally relative to the application. External distribution mechanisms work without any modification of the application and include network file systems and remote windowing systems in a distributed computing environment. Although external distribution mechanisms are easy to use and flexible, they often engender burdensome transfers of data between nodes of the distributed computing environment, and for this reason are far from optimal. Internal distribution mechanisms typically modify the application to be distributed in various ways. Internal distribution mechanisms allow optimized application-specific distribution, but frequently entail an inordinate amount of extra programmer effort to find an improved distribution and modify the application. Further, internal systems frequently provide ad hoc, one-time results that are tied to the performance of a particular network at a particular time.
An automatic distributed partitioning system (ADPS) works internally relative to an application to partition application units, and works automatically or semi-automatically to save programmer effort in designing distributed applications.
In the 1970's, researchers postulated that the best way to create a distributed application was to use a compiler in a run time environment to partition the application, and to provide the exact same code base to each of plural distributed machines as used on a single machine to execute the distributed application. After analyzing the structure of procedures and parameters in the source code of an application, metadata describing the structure of an application was generated from the application source code. Using this metadata, these ADPSs profiled the application and generate a communication model for the application. The Interconnected Processor System (ICOPS) is an example of an ADPS designed in the 1970's. The Configurable Applications for Graphics Employing Satellites (CAGES) also supports creation of distributed applications, but does not support automatic application profiling at all. A more recent example of an ADPS is the Intelligent Dynamic Application Partitioning (IDAP) System. IDAP generates from application source code an instrumented version of the application for execution in profiling scenarios, then generates from application source code another version of the application for distributed execution. ICOPS and IDAP suffer from numerous drawbacks relating to the universality, efficiency, and automation of these systems.
For example, access to application source code is required in ICOPS and IDAP, which compile application source code to generate metadata or an instrumented application for profiling. Neither ICOPS nor IDAP can profile an application without access to application source code, limiting the applicability of the systems.
An application profile is a model of an application. The application profile can include the units of an application and/or the costs of communication between units of the application according to expected usage patterns. Communication costs can be represented through several abstractions. For instance, communication costs can be represented as the time to transmit data from one machine to another or the amount of data transmitted. The former is network-dependent and will change with network interconnection. The latter fails to consider the realities of network latencies and bandwidths, i.e., it fails to consider network characteristics. Neither ICOPS, CAGES, nor IDAP produces a network-independent profile of an application that is combined with measurements of network characteristics and analyzed to partition the application for the network. Neither ICOPS, CAGES, nor IDAP allows re-profiling of an application or a network to adjust for changes in the application or the network, or to partition on different networks.