The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for object storage workflow optimization leveraging underlying hardware, operating system, and virtualization value adds.
Traditionally, object storage is used for backup, archival, data mining, searching, analytics, and the like. FIG. 1 depicts an example of a traditional object storage architecture. Traditional object storage architecture 100 comprises two diverse infrastructures 102 and 112 that are accessible by client devices 120 and 122 via load balancer 124. Each of infrastructures 102 and 112 further comprise two node groups. The first node groups 104 and 114 comprise proxy nodes 104a-104n and 114a-114n that are used for distributed load handling/request handling from client devices 120 and 122 into the storage namespace. The second node groups 106 and 116, i.e. the storage namespace, comprises storage nodes 106a-106n and 116a-116n that are responsible for writing to the disks or storage subsystems and, in this illustrative architecture, purely serves as a storage unit repository. However, in order to analyze or extract any meaningful information from raw data retrieved from the storage nodes 106a-106n and 116a-116n in second node groups 106 and 116, the data must be sent back to client 120 and 122 or to an additional client 126 or compute node 128 for analysis.
With the evolution of embedded compute infrastructures with built-in object storage architecture, computation utilizing the data stored in these compute infrastructures is offloaded to storage units instead of using a traditional client device for computation purposes. FIG. 2 depicts an example of an embedded compute engine in an object storage (Storlet) architecture. As with the architecture shown in FIG. 1, storlet architecture 200 of FIG. 2 comprises two diverse infrastructures 202 and 212 that are accessible by client devices 220 and 222 via load balancer 224. Each of infrastructures 202 and 212 further comprise two node groups. The first node groups 204 and 214 comprise proxy nodes 204a-204n and 214a-214n that are used for distributed load handling/request handling from client devices 220 and 222 into the storage namespace. The second node groups 206 and 216, i.e. the storage namespace, comprises storage nodes 206a-206n and 216a-216n that are responsible for writing to the disks or storage subsystems.
However, in addition to the common infrastructure, storlet architecture 200 also comprises software engines 208 and 218 as shown within second node groups 206 and 216, respectively. In an alternative embodiment, software engines 208 and 218 may reside within first node groups 204 and 214. Utilizing software engines 208 and 218, any computation or analysis required by client device 220 or 222 may be implemented by software engine 208 or 218. However, a user of client devices 220 and 222 has to frame computational algorithm to perform the computation or analysis and has to deploy or pass the computational algorithm to software engine 208 or 218 at the time of the original request. Then software engine 208 or 218 sends the results of the computation back to the requesting user of client device 220 or 222. Therefore, storlet architecture 200 differs from the traditional object storage architecture 100 of FIG. 1 in that, storlet architecture 200 does not require any additional client or compute node to perform computation or analysis of the data. That is, second node groups 206 and 216 act as compute nodes and return any results back to the user.