As the costs of data storage have declined over the years, and the capabilities of computer networks have improved, more and more data pertaining to a wide variety of applications can potentially be collected and analyzed. In particular, the increase in volumes of streaming data has been accompanied by (and in some cases made possible by) the increasing use of commodity hardware. The advent of virtualization technologies for commodity hardware has provided benefits with respect to managing large-scale computing resources for many types of applications, allowing various computing resources to be efficiently and securely shared by multiple customers. However, despite the continued maturation of such technologies, the management and orchestration of the collection, storage and processing of large dynamically fluctuating streams of data remain a challenging proposition for a variety of reasons.
In one scenario, a data stream service may receive data that is semi-structured and unaccompanied by any schema data. For example, the data may include records with varying keys and varying values. When such data is randomly partitioned and/or stored into storage units without regard to its structure, the result is a data store that exhibits a high degree of data “entropy.” That is, the data store will contain large groups of data records of different structure and different values, stored in close proximity with each other. As may be understood, such a disorganized store of data will be difficult to use and manage, and is undesirable for a host of reasons. It is generally desirable to reduce the entropy of such data to allow downstream systems to make more efficient use of the data.
While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that embodiments are not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.
It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first contact could be termed a second contact, and, similarly, a second contact could be termed a first contact, without departing from the scope of the present invention. The first contact and the second contact are both contacts, but they are not the same contact.