The large increase in amount of data generated by digital systems yearns for more sophisticated approaches to data storing, processing, and analyzing. In this context, scale-out network-attached storage (NAS) file systems have proved popular as a technology for managing this “Big Data.” However, conventional NAS file systems utilized for data storage are still passive, i.e., they do not drive decisions at the application level. On the other hand, applications that deal with pipelines of ingesting data from varied sources, processing these data according to business rules and then storing the processed results for further use are increasingly more common.
To deal with such data-intensive scenarios, some applications running in NAS environments utilize a “watch folder” mechanism, wherein the client application polls folders at regular time intervals for new files or changes in files, triggers transformation services on those files, and stores the results in output folders. To implement the “watch folder” mechanism, developers need to write code for polling content changes in folders that are exported via file transfer protocols, such as Network File System (NFS), Server Message Block (SMB), etc. These applications typically remain in a busy-wait state until they need to do computations upon the arrival or modification of a file. Under this configuration, each client application is a file system client unnecessarily consuming network resources (possibly other resources as well) while “doing nothing”. As the number of clients increase, a pool of wasted resources is generated that can negatively affect the overall system performance.
Alternatively, some data intensive applications may run on top of file systems that provide mechanisms which allow application developers to intercept file system I/O requests and transparently carry out low-level operations on files, e.g., data compression, before forwarding the requests to the storage driver. However, such available mechanisms are limited to being executed within the context of the file systems themselves. Moreover, they do not have any knowledge about business rules at the application level, let alone the capacity to run high performance computing (HPC) tasks.
The above-described background relating to file systems is merely intended to provide a contextual overview of some current issues, and is not intended to be exhaustive. Other contextual information may become further apparent upon review of the following detailed description.