There is known a conventional arrangement named the ‘eScience solution framework’, which is based on a novel distributed computing model called ‘Data-Centric Distributed Computing’ model. The Data-Centric Distributed Computing model brings about a paradigm shift in the traditional distributed computing model, by moving analytical software programs to data, as opposed to the traditional approach of moving data to software programs. This model allows end-users to dynamically launch software programs as services to data, thus avoiding transfer of a large amount of data across the network, personal information that is protected by statutory requirements for privacy across organizational boundaries, or confidential information relating to intellectual property across institutional boundaries. This Data-Centric Computing Model enables the application software to be launched dynamically as a service from a software library, consequently it obviates the need to install and configure application software on a system. This computing model allows the data to be kept at the source of collection or generation. Typically, a Data Centric Computing environment is very dynamic with a large number of multiple heterogeneous systems. These systems may be distributed across geographical locations and institutional boundaries. A typical Data-Centric computing environment would comprise of the following key components, in addition to one or more portal servers and end-user workstations or appliances: a data server, a software library and a metadata repository. A data server is where the actual data resides and would typically hold data from multiple data sources. The data processing software runs on this system. This could be a database server, a flat file repository, an image data store, etc. The software library includes a repository of software service components as well as a software library manager. This would typically have access to a service catalog that would list the available services. Based on the service consumer's request, the corresponding application software will be dynamically launched as a service to the data server where the data of interest reside. The metadata repository, for its part, has stored therein the metadata that describe the data servers as well as the data sources within them.
A growing and compelling need has been recognized in connection with the effective management of metadata and events in the Data-Centric computing environment, where a large number of events and interactions are anticipated in any of the participating systems just noted. As such, some of the common scenarios are described below with focus on the possible events generated in the Data-Centric computing environment.
In one scenario, an additional Data Server may be added on to the environment and will be registered with the metadata repository for provision of location services for data entities in a data-centric computing environment. Further, when the Data server is updated with new data entities, the metadata repository will be updated with the type of data entities managed by the data server. The type of data could include raw data, observational data, experimental data, pre-processed data, filtered data based on a certain set of filtering rules, post-processed data, curated data, derived data, clustered data, association data, correlation data, modeling data, simulation data, etc. This list shows how many related data entities can be derived out of one set of data entities.
In other scenarios, when there is a change in a data entity, the metadata will be updated to reflect the change to the data entity. When connectivity between the data entities (in the data server) and the metadata server is lost, the latter does not get updated with the changes in the data entities. Subsequently, these changes need to be updated on the next subsequent gain in connectivity. Further, the status of the services (those which were running during the disconnected period) need to be updated on the next subsequent gain in connectivity.
While Data-Centric computing environments provide numerous advantages as detailed hereinabove, it is also evident that in a Data Centric computing environment for a multidisciplinary and cross-institutional collaboration, there can be very large number of data sources (in the order of hundreds or thousands) that act as touch-points, each of which would emanate very large number of events (in the order of millions or billions). This type of complex environment thus presents challenges that to a large extent have not been hitherto addressed or overcome. Accordingly, a need has been recognized in connection with providing a system in place, in the context of a Data-Centric computing environment, that would be responsible for handling such multitudes of events effectively.