Distributed database systems can be used to store and access large-scale data in networked infrastructures such as large clusters, distributed computing systems, Intranet, Internet and other informational retrieval systems. Distributed database systems include storage and processing devices that are typically managed and controlled by a central database management system. The central database management system may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers.
A distributed database system controlled by a centralized database management system is limited for a number of reasons. The fact that a central master controls management functions leads to temporary unavailability if the master fails, even if the master is fault-tolerant. Also, problems such as network partitions often cause unavailability in at least part of the cluster. Finally, algorithms used for fault-tolerance of the master, such as Paxos, often take a significant time to recover from failures, during which the system is partly or fully unavailable. Having a central master can also hurt scalability.
In large-scale distributed systems, system devices frequently fail or lose network connectivity due to anomalies such as network disconnection and power failures. Ensuring continuous system availability in the face of these frequent failures is extremely important to providing good low-latency behavior.
Another problem in distributed database systems is the difficulty of supporting high write rates. Even something as simple as counting the number of hits on a website with many webservers is considered a difficult problem today. Logfile analysis is often not done in real-time, because it is too expensive to do so. Statistics such as the number of unique clients to access a website are very expensive to generate.
There are many problems today in distributed databases as applied in particular to answering search queries. Search engines provide a powerful tool for locating documents in a large database of documents, such as the documents on the World Wide Web (WWW) or the documents stored on the computers of an Intranet. The documents are located in response to a search query submitted by a user. A search query may consist of one or more search terms. What is needed are innovative techniques for extracting relevant information from databases efficiently and more intelligently. The ability to query a search engine more intelligently than just typing in a few search terms would be a big advance over today's search engines. The display of the results of a query could also use improvement.