As the costs of data storage have declined over the years, and as the interconnection capabilities of various elements of the computing infrastructure have improved, more and more data pertaining to a wide variety of applications can potentially be collected and analyzed. For example, mobile phones can generate data indicating their locations, the applications being used by the phone users, and so on, at least some of which can be collected and analyzed in order to present personalized information that may be helpful to the users. The analysis of data collected by surveillance cameras may be useful in preventing and/or solving crimes, and data collected from sensors embedded at various location within airplane engines, automobiles or complex machinery may be used for various purposes such as preventive maintenance, improving efficiency and lowering costs.
The increase in volumes of streaming data has been accompanied by (and in some cases made possible by) the increasing use of commodity hardware. The advent of virtualization technologies for commodity hardware has provided benefits with respect to managing large-scale computing resources for many types of applications, allowing various computing resources to be efficiently and securely shared by multiple customers. For example, virtualization technologies may allow a single physical computing machine to be shared among multiple users by providing each user with one or more virtual machines hosted by the single physical computing machine, with each such virtual machine being a software simulation acting as a distinct logical computing system that provides users with the illusion that they are the sole operators and administrators of a given hardware computing resource, while also providing application isolation and security among the various virtual machines. Furthermore, some virtualization technologies are capable of providing virtual resources that span two or more physical resources, such as a single virtual machine with multiple virtual processors that spans multiple distinct physical computing systems. In addition to computing platforms, some large organizations also provide various types of storage services built using virtualization technologies, including services to handle streaming data. Using such storage services, large amounts of data can be stored with desired levels of durability, availability and performance.
Despite the availability of virtualized computing and/or storage resources at relatively low cost from various providers, however, the effort required to manage growing collections of streaming data records remains a challenging proposition for a variety of reasons. In some cases, the records of a given stream may be distributed into partitions based on values of selected attributes of the records, where the number of initial partitions and/or the attributes may be selected by the customer on whose behalf the stream is being set up at the stream management service. The data records belonging to each of the partitions may be collected, stored and/or made accessible at respective sets of service nodes (e.g., at distinct hardware hosts or servers) in an effort to balance the workload. However, as the workload changes over time, the initial number of partitions of a given stream may eventually prove to be sub-optimal. Depending on the kinds of programmatic interfaces supported by the stream management service, it may not always be straightforward for the customers of the service to re-partition the stream appropriately.
While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that embodiments are not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.