1. Field of the Invention
The present invention relates to a distributed data processing system and a computer-readable medium storing a computer program for distributed processing. More particularly, the present invention relates to a distributed data processing system in which a plurality of processors perform a data analysis in a distributed manner, and also to a computer-readable medium storing a computer program to realize such a distributed data processing system.
2. Description of the Related Art
In the fields of science and technology, researchers routinely use computers to analyze a large amount of experimental or observational data, as well as applying appropriate calibration processes to them. Most research institutions have their own computer centers, in which many processors are interconnected by high-performance network facilities to form a distributed computing environment. Researchers in such institutions make preparatory arrangements when executing a particular analytical process. That is, they define a procedure of analysis and enter a list of source data files, result data files, and processing engines (i.e., computer programs for the analysis). In conventional systems, those data files and program files should be specified by using their names and full path names, according to the file system being used.
However, conventional file systems sometimes require the users to designate necessary resource files in different ways from computer to computer, because actual system configurations of computers may not always be the same. This means that the portability of resource files is not guaranteed in such conventional distributed computing environments. Suppose, for instance, that one researcher has performed a data analysis on one computer, with an analytical procedure script written for that computer, and he/she now attempts to run a similar data analysis on another computer. The problem is that it may not be possible for him/her to use the same analytical procedure script in the new computer. If this is the case, then he/she must rewrite the script (e.g., change the designation of source data files and other files) so that it will be suitable for a different file system environment.
Consider another problem situation where some processing engines are missing in a computer being used and it is unable to continue the analysis. Still another possible situation is that the computer""s magnetic disk unit cannot provide enough space to store all data files required. In such cases, it is necessary to transfer the present analytical procedure script and related resource files to another computer that is available in the distributed system. However, it is extremely difficult to seamlessly continue the analysis on different computing platforms, because of the lack of data portability.
Furthermore, in conventional distributed environments, management of resource files is left to individual researchers"" discretion, meaning that files can be transferred or copied freely within a computer or among different computers. In other words, uniqueness of each file is not always maintained in the system. This results in multiple instances of data or program files that have been unnecessarily replicated and accumulated in the same machine, just wasting invaluable computer resources.
Again, researchers should handle various resource files, including source image data files, intermediate data files, result data files, and analytical procedure scripts. They often use their individual work spaces or temporary storage area in a computer to store or manage such resource files. This situation, however, could cause a problem when they attempt to manage such dispersed files. Besides wasting computers"" storage resources, the presence of duplicated files could also cause a serious confusion when a user tries to delete unnecessary intermediate data or other files. Therefore, it has been desired to develop a safe and easy method to manage data and program files for analyses.
Taking the above into consideration, an object of the present invention is to provide a distributed data processing system having a capability to manage distributed data and program files in a unified fashion.
To accomplish the above object, according to the present invention, there is provided a distributed data processing system for analyzing data with a plurality of computers in a distributed environment. This system comprises a management database storage unit and a process execution unit. The management database storage unit stores a resource management database that associates identifiers of resource files used in analytical processes with actual storage locations of those resource files. Here, the identifiers should be unique in the distributed environment, so that all resource files will be uniquely identified thereby. The process execution unit is responsive to a process execution request for a specific analytical process, where necessary resources are specified with their identifiers. When such a process execution request is received, it executes the requested analytical process by using the resources whose locations are retrieved from the resource management database in the management database storage unit.
The above and other objects, features and advantages of the present invention will become apparent from the following description when taken in conjunction with the accompanying drawings which illustrate a preferred embodiment of the present invention by way of example.