The present invention relates to databases, and more specifically to lightweight partitioning of a database for multiple clients.
The rise of web services has greatly increased data storage needs. In response, a number of companies have developed cloud database systems, which are databases running on clusters of multiple machines (also known as a cloud). These systems generally provide greater reliability, throughput, and processing power than traditional databases. Machines can be added to the cluster as needed, allowing the databases to grow as large desired. Examples of cloud databases include Yahoo's Sherpa, Google's BigTable, Amazon's SimpleDB, Microsoft's Azure, and Facebook's Cassandra, among others.
Typically, cloud database systems sacrifice flexibility and access control to achieve their performance goals. In traditional databases, each application would create its own table (or even multiple tables) in the database. This prevents one application from accidentally overwriting another's data. It also allows enforced separation with access control, where the ability to update or even read data can be restricted on an application by application basis. By contrast, systems such as Google's BigTable create a single database table across all machines in the cluster. Different applications using BigTable share one table for all their data. This requires coordination between applications to prevent collisions of database keys and other inadvertent or intentional data corruption.
Other cloud databases allow multiple tables, but in very inflexible ways. For instance, creating a new table on Yahoo's Sherpa database cluster is a very expensive operation. Space for the new table must be allocated in the cluster. Every machine must be updated with the new table information before the table can be accessed to prevent data corruption. This update may involve hundreds or thousands of machines, any of which may be unavailable for extended periods of time due to heavy workloads or system maintenance. In some cases, such as Facebook's Cassandra, adding a new table may require shutting down and restarting the entire cluster, which may not be feasible for clusters supporting always-on 24/7 internet services. Other systems trade one type of inflexibility for another. For example, new tables may be created cheaply and easily, but at the cost of fixed table layouts and column types which do not support arbitrary data storage for many different types of applications.