The volume of online data has grown tremendously with the continued development of Internet sites and services. Along with this growth has come a gradual shift from the use of relational databases as a mechanism for storing data to the use of less structured, so-called noSQL data stores, which may provide advantages over relational databases in terms of scalability and access speeds. Unlike relational databases, which have well-defined “schemas,” i.e., definitions of data objects and their relationships, noSQL (non-Structured Query Language) data stores require comparatively little structure and are referred to as “schemaless.”
Schemaless data stores often store data in the form of key-value pairs and/or as structured data, such as JSON (JavaScript Object Notation) documents, XML (eXtensible Markup Language) documents, or other types of documents. Such documents, also referred to herein as “entities,” may be provided in the form of text strings or files that conform to a defined syntax. Products for supporting schemaless data stores include, for example, Azure DocumentDB from Microsoft Corporation and Google Cloud Platform from Google in Mountain View, Calif.
Many schemaless data stores store documents (e.g., JSON documents) in “collections,” i.e., groups of related documents, and store each collection in one or more “partitions,” i.e., physical storage drives or sets of drives. Each document in a collection may have a document ID (identifier) and, where multiple partitions are used, a partition key. Management software in the data store may force each document ID in a collection to be unique within each partition, such that the combination of document ID and partition key uniquely identifies each document in a collection across all partitions.