Numerous business applications are being migrated to “cloud” environments in recent years. Data centers housing significant numbers of interconnected computing systems for cloud-based computing have become commonplace, such as private data centers that are operated by and on behalf of a single organization, and public data centers that are operated by entities as businesses to provide computing resources to customers. In addition to core computing resources, operators of some public data centers implement a variety of advanced network-accessible services, including, for example, distributed database services, object storage services and the like. Such storage-related services typically support very high levels of scalability, data durability and availability. By using the resources of public provider networks, clients can scale their applications up and down as needed, often at much lower costs that would have been required if the required computing infrastructure had to be set up on client-owned premises. Using virtualization techniques, provider network operators may often use a given hardware server on behalf of many different clients, while maintaining high service quality levels for each of the clients. Sharing resources via such virtualization-based multi-tenancy may enable the provider network operators to increase hardware utilization levels, matching resource demand with supply more efficiently and keeping costs low.
As the costs of computing and data storage fall with the increased use of virtualization and cloud computing, new applications for data analysis are becoming more cost-effective. Many database services implemented at provider networks support very high volumes of updates, leading to data sets that may have to be distributed across tens or hundreds of physical storage devices, sometimes spread across multiple data centers. The database services may expose APIs (application programming interfaces) for reads and writes (e.g., creates/inserts, deletes, and updates of database records), which enable clients to easily change the contents of data objects such as tables and view the current version of the contents. However, while the interfaces provided by the database services may enable clients to access the data objects, and thus the cumulative impact of all the changes that have been performed, it may not be straightforward for clients to determine the sequence in which various changes were applied to the data. Information about the changes that are performed on tables and other data objects may be useful for a number of applications such as offline data mining to identify trends, selective checkpointing of relevant subsets of data at remote sites, and so on. Furthermore, at high volume data stores that are intended to handle hundreds of thousands (or even millions) of modifications per second, extracting information about the operations being performed without impacting incoming client requests may present a challenge.
While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that embodiments are not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.