Distributed database systems are well known. The contemporary Internet is an example of a large-scale distributed database providing for both data storage and data retrieval. Intra-company database systems have been in use for many years, for example between regional offices of multinational companies.
In a published international PCT patent application no. PCT/US02/04349 (WO 02/065329), there is described a peer-to-peer storage system including a storage coordinator that centrally manages distributed resources in accordance with system policies administered through a central administrative console.
In a known contemporary large-scale distributed database system, several nodes are arranged to communicate mutually to form a dynamic cluster of nodes operable to handle database operations collectively. In such a cluster, each node is implemented in practice often as an end-user personal computer having executing thereon one or more database software programs. Moreover, the nodes are conveniently arranged to communicate over contemporary end-user Internet connections. Furthermore, there can often be more than a million nodes in such a cluster. Each node of the known database is susceptible to having stored therein some data records. These data records stored in the nodes of the network collectively form data of the database system.
In operation, each node of the database system is susceptible to issuing one or more search queries by communicating with other nodes, for example other nodes of the cluster. Nodes of the database system co-operate together in response to the one or more queries to locate collectively data records matching the one or more queries.
The inventors are aware of proprietary network architectures which have been proposed by third parties which are implemented in the form of distributed databases. Such proprietary architectures are known in association with trade mark names such as “Freenet”, “Microsoft Peer-to-Peer Stack”, “FastTrack” and “Kademlia”.
The inventors have appreciated that there are several primary performance characteristics which are beneficially addressed when designing a large-scale distributed database system comprising a plurality of participating nodes.
A first performance characteristic is broad query functionality. Ideally, each data record in a distributed database system is denoted by one ore more key=value pairs. The database would thereby be operable to process queries like “find records where type=book and author contains ‘john’ and title begins with ‘the adventures of’ and price <50”.
A second performance characteristic is short search time. Ideally, for example so that on-line users experience an effectively immediate response to their one or more queries, the database is arranged to deliver search responses to the on-line users in less than one second, for example 0.5 seconds.
A third performance characteristic is that the database is susceptible to being implemented whilst demanding a low communication bandwidth for its participating nodes. Typically, contemporary end-user Internet connections have asymmetrical bandwidth, for example an outbound bandwidth in the order of 64 kbit/sec which is much more limited than its corresponding inbound bandwidth in the order of at least 512 kbit/sec.
A fourth performance characteristic is fault-tolerance. In the database, there should arise little or no data loss or performance degradation in the database as a whole in a situation where a few of the nodes of the database fail in operation.
The inventors have therefore devised an alternative distributed database system distinguished from proprietary databases described in the foregoing, the alternative database system being designed taking the four performance characteristics into consideration.