This invention generally relates to computer network storage systems, and more particularly, to Prefix Hash Trees (“PHT”) used in conjunction with an underlying Distributed Hash Table (“DHT”) storage system for network applications and distributed databases.
During the past few years a revolution in scalable storage has been occurring. As the prevalence of web-based consumer services grows from e-commerce to social networking, there is an increasing demand for scalable storage systems that favor availability over consistency as contextually related to the Consistency, Availability and Partition Tolerance (“CAP”) Theorem. In particular, traditional database management systems (“DBMS”) that typically favor so-called Atomicity, Consistency, Isolation and Durability (“ACID”) properties embodied through strong guarantees of transactional integrity are found to be lacking in terms of dynamic scaling. To address these concerns, there has been a movement toward so-called No Structured Query Language “NoSQL” systems, typically associated with so-called DHTs.
Although DHTs can provide significant improvement in terms of dynamic scaling, they are more limited in terms of the kinds of search operations they can support natively. An important aspect in the context of the within invention is range search. That is, the ability to retrieve a set of records in which a particular field falls within a pre-determined range, for example a search that finds all employees with a salary less than $70K.
Prefix Hash Trees (“PHTs”) support search operations including 1 dimensional range queries over a DHT. PHTs further support heap queries, proximity queries, and limited multi-dimensional search operations. PHTs are trie-based data structures, wherein each node has either 0 or 2 children. A leaf node stores a key and includes a label which is a prefix of the key. Each leaf node stores a threshold number of keys, and each internal node contains an amount of keys equal to at least the threshold number plus one. Each leaf node includes a pointer to the leaf nodes to its immediate left and right. Ramabhadran et al., Prefix Hash Tree An Indexing Data Structure over Distributed Hash Tables, University of California, San Diego, 2004.
Using a PHT allows for efficient range search in which the underlying storage system is a DHT. However the original PHT research left open a number of practical details relating to managing write conflicts that are unavoidable in a setting in which availability is favored over consistency. Particularly, the omitted details become apparent in environments in which writers of data records may be acting independently yet in conflict, on physically separate components of the underlying DHT. Fundamental defects in the original PHT research such as a write conflict render the research infeasible without modification.
Consistency in the traditional DBMS/ACID sense is not a requirement of DHTs however “eventual consistency” is a requirement. Consistency in the traditional DBMS/ACID sense refers to the fact that when user A inserts record R into a DBMS, user B will have to wait a short time for user A to complete their insert transaction prior to being able to access and see record R within the DBMS, that is the affected portion of the DBMS will be made unavailable to user B for a short period of time. Eventual consistency generally means that given a sufficiently long period of time over which no inserts or updates are sent by a user, it can be expected that during that time period, all inserts or updates will eventually propagate through the system and all the replicas will be consistent. For example, when user A inserts record R into a system, and for some time subsequent to this insert, user B may not see record R but user B will not be precluded from accessing the affected portion of the system. However, after some time period of update or insert inactivity, record R will have been replicated across the DHT system and going forward all future readers will see record R.
What is needed is a method for resolving the various types of conflicts that will arise in a system based on the DHT/PHT algorithms associated with the underlying eventual consistency property of the underlying DHT.