The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for geo-fencing aware compute infrastructure.
In today's global system of interconnected computer networks, Information Technology (IT) has become hugely dependent on data and ensuring each entities compliance, auditing, and security in terms of data placement, maintenance, separation in compute analysis, and the like, with respect to the entity's respective origin or location as well as any laws that govern the infrastructure at that location within a geographically distributed object storage namespace.
Traditionally, object storage is used for backup, archival, data mining, searching, search, analytics, and the like. This unstructured or raw data resides in the data storage for a much longer tenure of period when compared with traditional compute infrastructures. FIG. 1 depicts an example of a traditional object storage architecture. Traditional object storage architecture 100 comprises two geographically diverse infrastructures 102 and 112 that are accessible by client devices 120 and 122 via load balancer 124. Each of infrastructures 102 and 112 further comprise two node groups. The first node groups 104 and 114 comprise proxy nodes 104a-104n and 114a-114n that are used for distributed load handling/request handling from client devices 120 and 122 into the storage namespace. The second node groups 106 and 116, i.e. the storage namespace, comprises storage nodes 106a-106n and 116a-116n that are responsible for writing to the disks or storage subsystems and, in this illustrative architecture, purely serves as a storage unit repository. However, in order to analyze or extract any meaningful information from raw data retrieved from the storage nodes 106a-106n and 116a-116n in second node groups 106 and 116, the data must be sent back to client 120 and 122 or to an additional client 126 or compute node 128 for analysis.
With the evolution of embedded compute infrastructures with built-in object storage architecture, computation utilizing the data stored in these compute infrastructures is offloaded to storage units instead of using a traditional client device for computation purposes. FIG. 2 depicts an example of an embedded compute engine in an object storage architecture. As with the architecture shown in FIG. 1, embedded compute engine object storage architecture 200 of FIG. 2 comprises two geographically diverse infrastructures 202 and 212 that are accessible by client devices 220 and 222 via load balancer 224. Each of infrastructures 202 and 212 further comprise two node groups. The first node groups 204 and 214 comprise proxy nodes 204a-204n and 214a-214n that are used for distributed load handling/request handling from client devices 220 and 222 into the storage namespace. The second node groups 206 and 216, i.e. the storage namespace, comprises storage nodes 206a-206n and 216a-216n that are responsible for writing to the disks or storage subsystems.
However, in addition to the common infrastructure, embedded compute engine object storage architecture 200 also comprises software engines 208 and 218 as shown within second node groups 206 and 216, respectively. In an alternative embodiment, software engines 208 and 218 may reside within first node groups 204 and 214. Utilizing software engines 208 and 218, any computation or analysis required by client device 220 or 222 may be implemented by software engine 208 or 218. However, a user of client devices 220 and 222 has to frame computation algorithm to perform the computation or analysis and has to deploy or pass the computation algorithm to software engine 208 or 218 at the time of the original request. Then software engine 208 or 218 sends the results of the computation back to the requesting user of client device 220 or 222. Therefore, embedded compute engine object storage architecture 200 differs from the traditional object storage architecture 100 of FIG. 1 in that, embedded compute engine object storage architecture 200 does not require any additional client or compute node to perform computation or analysis of the data. That is, second node groups 206 and 216 act as compute nodes and return any results back to the user.
However, with the embedded compute engine object storage architecture of FIG. 2, a scenario may exist where in a storage namespace is shared by two countries having different laws, policies, or the like, and where the data residing in the storage namespace is governed according to the respective countries laws, policies, and or the like where the storage namespace resides. While the two countries may have a treaty indicating a sharing of the data residing in the storage namespace for a combined project, such a treaty would prohibit any computational algorithms from being executed by a software engine at the same geographical location associated with the storage namespace. That is, only the data may be utilized by both the countries and any country accessing the storage namespace outside of the namespace's geographic location may not perform any analysis or computation utilizing computational algorithms using the software engine at the same geographical location associated with the storage namespace.
In this scenario, there exists a problem in terms of having separation in compute infrastructure, i.e. storage nodes itself are acting as compute infrastructure, as there exists no way to govern the compute algorithm owned by a country to be limited only to that particular country but not be deployed on to the shared compute infrastructure embedded in the object storage. Therefore, such offloading of computation to storage units results in a security concern, where computation analysis algorithms may not be secured in order to adhere with respective policies and/or laws associated with the countries where the geographically diverse infrastructures in which the software engines reside. Instead, the computation analysis algorithms are shared with various countries utilizing the same storage unit/namespace, which prevents compute resource separation between different countries sharing the same storage unit/namespace.