Embodiments generally relate to cloud computing, and more specifically to processing data stored on multiple nodes.
Remote storage is now available as a “cloud service” to consumers, businesses, and organizations. Cloud storage allows a user to upload files for storage. Cloud storage is generally accessed via an Application Programming Interface (API). Cloud storage access methods vary by provider. Some providers support a web service API. A web service API may be based on a Representational State Transfer (REST) architectural style in which objects (files) are accessed using Hypertext Transport Protocol (HTTP) as a transport. For example, a user may access files stored in the cloud via a Uniform Resource Locator (URL) using a web browser. Cloud storage may also be accessed using a file-based protocol. Examples of file-based protocols include NFS/Common Internet File System (CIFS) and File Transfer Protocol (FTP).
An application running on the local computer of a user may access a file from a cloud storage service. The application may down load a file, process the data locally, and then store results or other new data with the cloud storage service. In addition to storage, processing is now available as a “cloud service” and an application running in the cloud may access data stored in the cloud.
Apache Hadoop provides for storing and processing of data on multiple nodes. Hadoop Distributed File System (HDFS) is a distributed file system included in the Hadoop architecture. Hadoop generally requires that the logical storage layout of data be explicitly mapped out and understood by the HDFS before an application begins running.