In recent years, data storage systems have been migrating into the cloud. For example, large service provider networks such as Amazon Web Services™ offer a variety of cloud-based storage services such as object storage services, block-based storage services, file systems, and relational database services. The service provider may provide access to these storage services via public access interfaces or protocols. The data storage service may be hosted by the service provider's network, which includes large numbers of interconnected computing systems. The computing system may comprise a cluster of storage service nodes that are responsible for executing software operating as the storage service engine for the data storage service. The storage service nodes may be physical machines or instances of virtual machines operating on virtualization hosts. Typically, the storage service nodes and the storage service engine, which may include a query planner, translator, and compiler, are statically integrated to the data storage service. Clients of the data storage service are not generally permitted to interact directly with or customize the storage service engine or the underlying storage backend.
Client applications may perform data analysis on data stored in a cloud-based data storage service. A client application may access the data storage service in one of several of ways. For example, the client application may use a query language such as structured query language (SQL) to obtain data from the service. As another example, the client application may access the data in using an application programming interface (API) provided by the storage service. However, these types of interfaces present a number of drawbacks. First, users of the data storage service must learn to use these service interfaces. Second, the service interface may not allow the client application to take advantage of all capabilities of the underlying storage service engine. This latter problem forces the client application to copy large datasets over the network to perform more sophisticated analysis on the client site. Moreover, the algorithms implemented by the storage service engine, for example the query planner, translator, compiler, etc., represent an overgeneralization in terms of how data access requests should be handled. On the other hand, client applications have particular knowledge of the particular type of data access requests that will be issued and the type of data the requests will be accessing, and thus may be able to take advantage of special optimizations in terms of how the requests are processed. However, client applications are not always able to dictate such optimizations through the service interface. Third, in a pure client-server model, the client must maintain sufficient physical resources on the client site to perform the client-specific data analysis tasks. These and other problems prevent the client application from taking full advantage of the cloud computing environment, and generally result in client systems that are inflexible in design and inefficient for many types of data tasks.
While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that embodiments are not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.
It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first contact could be termed a second contact, and, similarly, a second contact could be termed a first contact, without departing from the scope of the present invention. The first contact and the second contact are both contacts, but they are not the same contact.