1. Field of the Invention
This invention relates to data management and more particularly relates to an apparatus, system and method for selecting optimal replica sources in a grid computing environment.
2. Description of the Related Art
Recent increases in networking speed, capacity, and usage have facilitated harnessing geographically disperse computing resources to solve computationally complex problems heretofore unsolvable with local computing resources. The ability to harness heterogeneous inter-networked computing resources into a single powerful tool has facilitated the development of a new computing paradigm often referred to as ‘Grid Computing.’ Grid computing enables the virtualization of distributed computing and data resources such as processing power, network bandwidth, and storage capacity to create a single processing image that provides users and applications seamless access to vast IT capabilities.
For example, FIG. 1 is a schematic block diagram depicting one example of a typical grid computing environment 100. The depicted grid computing environment 1100 includes a number of sites 110, with computing nodes such as workstations 120 and servers 130, interconnected with a local network 140. Each computing node may comprise one or more separate file systems running on various system platforms. In the depicted arrangement, each site 110 is connected to a network 160 via one or more inter-site links 150. The network 160 may comprise a Local Area Network (LAN), Wide Area Network (WAN), the Internet, or the like.
Each computing file system 120, 130 within each site 110 may operate as a computing node within the grid. Typically, computing resources that are unused by local users and processes may be offered for use by one or more grid computing tasks. To increase the performance of data access for such tasks, it is often desirable to create local read-only copies (replicas) of data files that may be conveniently accessed during execution. Local replicas of data files may reduce network response time, improve data locality, and/or increase robustness, scalability, and performance of grid-oriented applications.
The process of creating and distributing replicas of data files to multiple distributed systems creates management issues for users and system administrators. For example, many users throughout a grid may choose to copy data files to a large number of computing nodes throughout the grid. Users may loose track of what files have been replicated and to which locations. Searching throughout the grid to update or delete such files is a very tedious, uncoordinated, and typically an error prone process.
FIG. 2 is a block diagram depicting one example of a replication infrastructure 200 that facilitates distributing and tracking replicated files throughout a grid. The depicted replication infrastructure 200 includes local files 210, a file transfer service 220, and a replica location service 230 that uses one or more local replica catalogs 240 and replica location indexes 250. The local files 210 as used herein refer to files local to the application or user, but not necessarily local to a specific file system. One well known example of the depicted replication infrastructure 200 commonly used is provided by the Globus Toolkit™ created in conjunction with the Open Grid Service Architecture (OGSA) and European DataGrid project.
The file transfer service 220 facilitates the transfer of data files to selected locations on the data grid. Examples of the protocols used in file transfer service 220 include a local file transfer, File Transfer Protocol (FTP), Hyper Text Transfer Protocol (HTTP), and grid FTP. Often the file transfer service 220 transfers the data files between disparate file systems. The transferred files, also referred to as replicas, are typically copied to specific data stores that contain the local files 210 in order to increase data locality and improve performance.
The local replica catalog 240 maps logical file names to physical file names. Generally, a logical file name is a unique logical identifier for desired data content and the physical file name is a unique Uniform Resource Locator (URL) that specifies the data's location on a storage system. The use of logical file names facilitates system-independent and grid-independent programming and execution.
The local replica catalog 240 typically contains mappings for data file replicas that are locally accessible on one or more data stores associated within a site 110 or similar geographical unit. The local replica catalog 240 may also store user-specified attributes associated with a file. The replica location index 250 indicates which local replica catalogs 240 contain mappings for specific logical file names.
The replica location service 230 manages the replica location indexes 250 and the local replica catalogs 240, and facilitates access to the information contained therein via an Application Programming Interface (API). Additionally, the replica location service 230 correlates one or more physical locations to a given logical file name. Multiple replica location indexes 250 can be linked via the replica location service 230 in order that logical file names that are not found within one replica location index 250 may be found in a linked replica location index 250.
The replica location service 230 facilitates managing and tracking local replicas. However, the functionality provided by the replica location service 230 is fairly primitive. For example, the replica location service 230 typically manages index and catalog entries one file at a time, and may not guarantee consistency between data replicas or the uniqueness of filenames. Additionally, the location services provided by the replica location service 230 are not integrated with file-oriented services such as the file transfer services 230 and file-oriented system calls.
One of the major drawbacks of the systems 100, 200 described above is that the replica location service 230 does not locate an optimal replica source, wherein the optimum replica source is identified by one or more preferred attribute. One example of a preferred attribute is membership of a replica source in a list of preferred replica sources. Another example is identification of a preferred replica source based upon performance of the network 160 between the replica source and a replica destination. One replica source may be preferred over another because one source may allow for more efficient and reliable copying of the replica, without undue taxation of a particular device's processing resources. The replica location service 230, does not choose any replica source preferentially over another, instead it merely provides a mapping of physical locations to the logical file name of the desired replica. An application requesting the mapping typically then chooses the first listed logical file name.
From the foregoing discussion, it should be apparent that a need exists for an apparatus, system, and method for selecting optimal replica sources in a grid computing environment. Beneficially, such an apparatus, system, and method would allow for fast, reliable, selection of the most efficient and convenient sources for replication of the data set. Selection of an optimal replica source will save time copying the replica, and reduce heavy resource taxation of overused sources within the grid.