Because of the increasingly interconnected nature of computing devices throughout the world, the data gathered and generated by those computing devices has grown at an exponential rate. The time to process such increasing amounts of data, using traditional methodologies, will, therefore, exponentially increase as well. For businesses, educational and governmental institutions, and others who provide or consume services derived from billions of individual data points, the management of such a large amount of data in an efficient manner becomes crucial. Thus, as the amount of data being gathered and generated increases, the infrastructure for storing, managing, and operating on such data needs to expand as well.
Traditionally, large quantities of data were efficiently handled using fault-tolerant storage systems and parallel-processing algorithms. Fault-tolerant storage systems enabled large quantities of data to be stored across hundreds or even thousands of inexpensive storage media, despite the risks that at least one of these storage media would fail, rendering the data stored on it inaccessible. Parallel-processing, or algorithms enabled large quantities of data to be efficiently gathered and processed by simply dividing the necessary labor across inexpensive processing equipment, such as the multi-core microprocessors present in modern computing hardware.
However, while fault-tolerant storage systems can be implemented in a generic fashion, such that a single fault-tolerant storage algorithm can be used to store any type of information, parallel-processing algorithms are, by their nature, specific to the particular problem that they seek to solve or the particular task that they seek to accomplish. Thus, a search engine can use the same fault-tolerant storage mechanisms as a weather prediction engine, but, obviously, they would each rely on vastly different parallel-processing algorithms.