This specification relates to fault prevention in a computer cluster.
A framework, e.g., Apache Hadoop, can be deployed to manage distributed storage and distributed processing of large data sets on clusters of many computers, i.e., nodes, which may be physical or virtual. Oftentimes, the computers are built from commodity hardware. The framework can include multiple components to be run on different nodes in the cluster. Each component can be responsible for a different task. For example, a first component, e.g., Hadoop Distributed File System (HDFS), can implement a file system, and a second component, e.g., Hive, can implement a database access layer. The components work together to distribute processing of a workload of files amongst nodes in the cluster.
A cluster of computers running the framework is highly scalable. Additional nodes can be added to the cluster to increase throughput. Each cluster can also be highly resistant to failure because data can be copied to multiple nodes in the cluster in case one or more nodes fail.