Currently available object stores include Amazon S3, Red Hat Ceph, Open Stack Swift, EMC's Atmos, and EMC's ViPR Object Data Service. Such systems may provide scale-out mechanisms for users to store and retrieve object data and metadata. These systems typically provide REST-based application programming interfaces (APIs) to insert, update, delete, and retrieve objects. For example, in the field of medical science, an object store may be used to store Digital Imaging and Communications in Medicine (DICOM) information, which contains metadata as well as the actual image data. Such metadata may include a patient's id, gender, age, etc.
In general, these systems do not provide query capabilities, making it difficult to build applications that query and retrieve objects therefrom. The current approach to solving this use case with object systems involves an application retrieving a superset of the desired objects and discarding those objects that do not meet the search criteria. This approach is resource intensive and inefficient for the object system's servers, the network, and the application.
Distributed processing systems allow for large-scale distributed processing. For example, Hadoop and Apache Spark are open-source distributed computing frameworks that enable distributed processing using commodity computing device. It is known to expose object data to distributed processing systems.