With rapid development and widespread utilization of technologies in the last few decades, a large volume of digital data is generated on a daily basis. Organizing and managing such amounts of data have promoted the development of database technologies. Relational database management systems (“RDBMS”), such as Oracle Database Management System, Microsoft SQL Database Management System and MySQL Database Management System, have thus been proposed and gained broad acceptance for data management. Relational database management systems focus on writing operations (also referred to herein as writes), such as record update and deletion, as much as they focus on reading operations (also referred to herein as reads). These systems rely on primary keys of each data table to generate indexes for database search.
However, relational database management systems, and other types of traditional database systems, no longer meet the demands of certain users as Internet and other technologies produce and consume dramatically more data in recent years. In particular, larger and larger amount of data are generated as each day goes by. The amount of data is now measured in Gigabytes (GBs), Terabytes (TBs), Petabytes (PBs) and even Exabytes (EBs). Internet tweets, healthcare data, factory data, financial data and Internet router logs are examples of large quantity data. In addition, trillions of rows of data are regularly produced by machines, such as sensors, cameras, etc. Oftentimes, these data are frequently read, but rarely written (such as update and deletion). Alternatively, it can be said that such data is rarely mutated and has a low mutation rate. The ratio between the number of time that a piece of data is read and the number of times it is written can reach the level of, for example, one million to one or even higher.
Accordingly, a database system managing these types of large data volumes needs to optimize read operations over write operations. For example, the new database system needs to handle millions and even billions of read queries per second. Furthermore, such a new type of database system needs to provide an acceptable minimum latency in searching an extremely large data base, reading a piece of desired data, and providing it to computer software applications and users. Traditional database management systems cannot meet such a demand.
Various new databases have been proposed to meet progressively larger amounts of data. For example, Google Inc. has developed Bigtable data storage system built on Google File System and other Google technologies. Bigtable is not a relational database system. Instead, it is a sparse and distributed multi-dimensional sorted map that is indexed by a row key, column keys and a timestamp. In Bigtable, row keys in a table are arbitrary strings; and column keys are grouped into column families as shown in FIGS. 1 and 2. Amazon.com, Inc. has developed a similar database system—Amazon DynamoDB. DynamoDB is a NoSQL database that is also not a relational database. DynamoDB uses tables, items and attributes in its data modeling. Furthermore, DynamoDB requires unique primary keys, such as unique attributes, hash values, and hash and range values. DynamoDB also implements Local Secondary Index and Global Secondary Index. These efforts still fall short for the new challenges. For example, Bigtable and DynamoDB are not optimized for data with extremely high read-write ratios, are sparse and thus not storage efficient, still rely on conventional infrastructure, and cannot directly access storage devices for optimal performance.
Accordingly, there is a need for a new database management system that manages large amounts of data, is highly optimized for data with a low mutation rate, achieves minimum latency with a massively parallel architecture and mechanism, and reduces storage cost.