Multi-tenant distribution computation has an increasingly wider application. For example, a system such as MapReduce has been applied to numerous cases of mass data analysis. Many such distributed computing systems rely on a distributed file system (DFS) to provide extendable data storage. In operation, data analysis jobs submitted by one or more users are divided into a plurality of map and reduce tasks, by a job server. These tasks are issued to different task servers for execution. The execution process of the job usually involves read/write operation on the data stored in the DFS.
In known prior art related to multi-tenant distributed computation, the security of tenant data becomes a significant challenge. For example, in a common multi-tenant distributed system, different tenants usually share the same DFS. Therefore, data or files belonging to different uses will be stored on the DFS. Moreover, all tenants use the same metadata server of the DFS as an interface to access data. In order to guarantee security and isolation of user data, the DFS may set different access rights for different users to manage the user data. However, a malicious user might steal a password of other tenant or event of the administrator or use other means to illegally obtain access rights to other's data or all data stored on the DFS, therefore compromising on data security of other tenants. Besides, some other limitations might also occur if multiple tenants share a single DFS. For example, the tenants cannot set the same access path and name for their data or files. Therefore, in a multi-tenant case, a file name of one tenant might be in conflict with a file name of other tenant in the same file system namespace.
In order to overcome the above mentioned problem, it has been proposed in the prior art to divide users into individual clusters. However, when the processed job (e.g., data analysis) involves cross-cluster data access, such practice would seriously compromise the performance of the system. It is because in this case, data has to move cross-cluster. In particular, such data move may cause adverse impact on multi-tenant cooperation, sharing of data between different tenants, and similar operations. Therefore as is well known in art, a basic principle in the distributed computing environment is: trying to reduce such data move to the extent possible in order to guarantee performance.
In view of the above, in the distributed computing architecture of the prior art, there are still issues and defects to improve in aspects such as isolation protection of user data and coordination between different users.