1. Field of Invention
The present invention relates to a computer application technology, and more particularly to a data storage and query method which supports agile development and horizontal scaling.
2. Description of Related Arts
Along with the rapid development of the internet, especially the emergence of Web 2.0 business including the Online Social Networks and the Online Social Media, the internet industry poses two challenges to the data management system.
Firstly, the data in the internet increases exponentially, which is called Big Data by the relative industry. The Big Data put a huge pressure upon the conventional horizontally scaled data management system. The data management system has become the bottleneck of a great number of internet service systems.
Secondly, the internet business changes rapidly and experiences short product development lifecycles, especially the social sites whose product development lifecycle is counted by day. As the basic component of the internet products, the data management system is required to support the agile development to reduce the product development lifecycle.
Thus, the key to construct the internet service system lies in a data management system which simultaneously supports a horizontal scaling mode of the Big Data and an agile development mode of the short development lifecycle.
There have been two types of data management techniques to support the horizontal scaling mode.
The first technique is Key-Value (KV) Store which abstracts the data into a binary vector (Key, Value). The Key is the only identifier for the storage and query of the data, and the Value is the data content corresponding to the specific Key. The KV Store has following three primitives. Boolean Put (Key, Value): store (Key, Value), if succeed, return True; if fail, return False. Boolean Del (Key): delete (Key, Value), if succeed, return True; if fail or no corresponding Key, return False. String Get (Key): obtain the Value corresponding to the Key, if fail or no corresponding Key, return NULL. Because any two (Key, Value) pairs share no dependency, the horizontal scaling of the KV Store can be accomplished via the consistent hashing or the B+ tree.
The second technique is Key-Row Store, also named as Big Table, which abstracts the data into a nested n+1-dimensional vector (Key, (SubKey 1, Value 1), (SubKey 2, Value 2), . . . , (SubKey n, Value n)). The Key is the only identifier for the storage and query of the data rows; each data row comprises the multiple data; the retrieval of the corresponding Value is accomplished via the SubKey. The Key-Row Store has following five basic primitives. Boolean Put (Key, SubKey, Value): add (SubKey, Value) into the data row corresponding to the Key, if succeed, return True; if fail, return False. Boolean Del (Key): delete the data row corresponding to the Key, if succeed, return True; if fail or no corresponding Key, return False. Boolean Del (Key, SubKey): delete (SubKey, Value) in the data column corresponding to the Key, if succeed, return True; if fail or no corresponding Key or SubKey, return False. String Get (Key): obtain the data column corresponding to the Key, if fail or no corresponding Key, return NULL. String Get (Key, SubKey): obtain the Value corresponding to the SubKey in the data column which corresponds to the Key, if fail or no corresponding Key or SubKey, return NULL. The data columns share no dependency, the same as the KV store, and thus the horizontal scaling of the Key-Row Store can also be accomplished via the consistent hashing or the B+ tree.
Although the above two techniques support the horizontal scaling mode and solve the first challenge which the internet industry poses to the data management system, both of the above two techniques only provide the simple primitives. The simple primitives are unable to rapidly construct applications with complex logic, so both of the two techniques are unable to support the agile development mode and thus fail to solve the second challenge.
The SQL (structured query language) has become one of the primary techniques which allow the data management system to support the agile development because of the uniform standard, the semantic richness and the simple structure thereof. However, the compatibility with the SQL semantics results in the mutual dependency within the data, which excludes the possibility of horizontal scaling. As a result, the conventional relational databases compatible with the SQL semantics only depend on the horizontal scaling and are unable to handle with the challenge from the Big Data of the internet.